Closed Bug 811593 Opened 12 years ago Closed 6 years ago

XHR file reading via app:// protocol on Unagi is very slow

Categories

(Firefox OS Graveyard :: General, defect, P3)

x86
macOS
defect

Tracking

(blocking-basecamp:-)

RESOLVED WONTFIX
blocking-basecamp -

People

(Reporter: zbraniecki, Unassigned)

References

()

Details

(Keywords: perf, Whiteboard: [c=progress p= u= s=])

Attachments

(1 file)

As part of an effort to speed up Gaia's l10n I did some basic measurements (bug 809600 comment 11) and it seems that file reading via XHR is taking between 100ms and 300ms (the number varies).

Is it possible that there's some bug in either kernel code, or gonk that makes synchronous and asynchronous reads that slow?
Or maybe the local server behind the app:// protocol? The example url used for XHR load is app://settings.gaiamobile.org/locales/locales.ini

On a 5400rpm HDD drive same loads take 1ms.

Simple I/O reported by dhylands on Unagi:
[18:13:39] <dhylands>	 Yeah - so I'm getting around 10msec to cat 15k to /dev/null
[18:13:52] <dhylands>	 And around 220msec to do 2.5Mb

That's a critical issue for localization. With the current performance timing we have to workaround the performance bottleneck by preprocessing and bundling resources into HTML files.
(actually, the numbers are from otoro, not unagi)
Summary: File reading on Unagi is very slow → XHR file reading via app:// protocol on Unagi is very slow
Blocks: 809600
[18:19:13] <dhylands>	 Unagi seems to be MUCH slower
[18:19:25] <dhylands>	 60 msec to cat 15K
the file size is not significant factor.
locales.ini is 171 bytes, settings.en-US.properties is 17kb and both load times are similar.

Last test gave me avg. time 307.09ms (no matter if I load sync or async)
The otoro is using a yaffs2 file system, while the unagi is using ext4. Could this explain the difference?
or that app:// uses jar: protocol - maybe we actually close the zip connection after loading the page, and each XHR to its resources reopens the zip file from the scratch?
blocking-basecamp: --- → ?
For the record, I was doing:

sync; echo 3 > /proc/sys/vm/drop_caches
time cat some-file > /dev/null
blocking-basecamp: ? → ---
Hmmm. I didn't intentionally set blocking-basecamp to --- and I didn't get a collision report. Maybe that's a bug in bugzilla?
blocking-basecamp: --- → ?
Wondering, what do we with omni.ja to get around typical zip perf issues? Taras?
I definitely don't record any improvement between first and following files read from the same zip via XHR, so it seems that if the performance cost involves opening zip, it's the same for the each of the files loaded.
We re-order files in the omni.ja, and we have a bug open to investigate that for application zips.
I wrote a minified testcase for this bug, and found out that if the resource loading is delayed (vs. onload) it's actually rather fast.

I even switched to manually triggering whole localization framework resource loading and got times in the 20-40ms zone (per file). It's still much slower than HDD, but definitely better than 100-300ms recorded previously.

So, with this patch, the download is triggered onload and on my unagi it takes:

 - cold boot onload: 300-800ms
 - warm boot (killing app) onload: 200ms
 - manual load (no onload): 20-50ms
 - manual reload (file loaded previously): 2-20ms

My initial hypothesis is that during onload more resources are being read and each XHR is competing for access to the zip reader.
But what files are being loaded *post* DOMDocumentLoaded ?

If we could get down to the numbers that we see with manual load, then we're fine I believe.

Could someone help me understand what happens when this minified testcase is triggered by onload and tries to do XHR? Why is it that much slower than if I trigger it with a button one second later?
yeah, delaying l10n library initialization by 5seconds with setTimeout results in a drop in resource loading time to avg. 36.33 (max 46ms) for those 6 files.

This means that if the files are loaded close after onload it takes 4x more to load them.
(In reply to Zbigniew Braniecki [:gandalf] from comment #12)
> yeah, delaying l10n library initialization by 5seconds with setTimeout
> results in a drop in resource loading time to avg. 36.33 (max 46ms) for
> those 6 files.
> 
> This means that if the files are loaded close after onload it takes 4x more
> to load them.

we could be opening and closing a zip file every time you access a resource from it on startup. printf the filename from 
http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsJAR.cpp#1098
to see if we are opening the same jar files multiple times.

If so, fixing bug 504217 would fix this.
Performance might simply be limited by the sdcard. They are only specified up to 10 MB/sec or less.

  http://en.wikipedia.org/wiki/Secure_Digital#Speeds
Onboard flash is even slower. I have seen read access from flash peak around 1mb/s. The zip file issue is real though. Lets fix that.
Yeah, once I start to see noticeable performance differences between different file sizes, I will believe that we're getting close to sdcard limits. But for now it seems like it's all about file access and that may be the zip issue.
Not blocking on this since I don't think we would hold the release on this. Also the ideas here seem somewhat speculative.

That said, anything that can improve zip reading performance would be great. So don't hesitate to renom if you have numbers, or ask for approval if you have patches.
blocking-basecamp: ? → -
(In reply to Taras Glek (:taras) from comment #13)
> we could be opening and closing a zip file every time you access a resource
> from it on startup. printf the filename from 
> http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsJAR.cpp#1098
> to see if we are opening the same jar files multiple times.

No, we're opening application.zip only once, and then each hit is from cache.
(at least on desktop build. I don't know how to collect printf result on unagi)
(In reply to Zbigniew Braniecki [:gandalf] from comment #19)
> (at least on desktop build. I don't know how to collect printf result on
> unagi)

just use printf_stderr() and this will show up in logcat.
Yeah, so it seems that the slow file load is similar across all files loaded at boot time (css, js, imgs as well).

I measured it with window.performance.now() timer fired on onload() event.
First, I removed all CSS and JS files from Settings app and measured that.
Then I readded 12 CSS files and measured again.

The results are a bit noisy, but there's a clear correlation. Each CSS file adds ~80ms to load time. (and then my custom XHR also takes around 80ms).
Every once in a few reboots the total load is much faster and my custom XHR is also much faster (as if this zip got on a fast path).

Example results:
1) with 12 CSS files:
 - total: 1006ms, my custom: 100ms
 - total: 800ms, my custom: 82ms
 - total: 1079ms, my custom: 61ms
 - total: 658ms, my custom: 9.61

2) Without 12 CSS files:
 - total: 530ms, my custom: 26ms
 - total: 416ms, my custom: 9ms
 - total: 608ms, my custom: 41ms
 - total: 416ms, my custom: 16ms

I rebooted the phone before each test, and I did not include the results from the first run after uploading gaia to the phone (they are different).

So it seems that by just removing JS files from <head> I lowered load time from ~150ms to ~80 and by removing CSS files I lowered it to ~30ms.

On top of that the document is ready almost a second faster.

The settings app is loading:
 - 12 CSS files
 - 9 JS files
 - 6 l10n files

It seems that unless we can improve the time it takes to read multiple files simultaneously from the archive during application loading, it makes sense to preprocess each file and include all it's CSS/JS/L10n assets to shave the app load time significantly.
Raw performance of the drive (read) is around 1.4MB/s so we should be able to squeeze a lot here.

root@android:/system/b2g # dd if=./libxul.so of=/dev/null bs=8k count=1024000  
1024000+0 records in
1024000+0 records out
8192000 bytes transferred in 5.621 secs (1457391 bytes/sec)
Actually, if we pick something closer to the flash block size, you'll get much faster performance. On my otoro if I use a bs around 256K, I get around 10 Mb/sec.

I copied libxul.so to a different file and did a reboot. libxul.so will be partially in the cache if b2g is running.
good to know. so the raw I/O should be the issue.

btw. changing zip compression on application.zip between NONE and BEST didn't affect the numbers too much.
should *not* be the issue.
I turned on the "show time to load" setting, and got very consistent timings loading the settings app:

- 1.8s when running OOP.
- 250ms whith OOP disabled. 

So my take is that there is nothing to improve on the app:// protocol side. We're slowed down by IPC related issues.
That's a great catch!

It seems like a material for a separate bug, because it's not really affecting resource loading that much.

First of all, what you're observing is warm boot times (and I can confirm those), in order to measure cold boot you have to reboot the phone between each app launch (there may be smarter way with clearing cache, I don't know).

With cold boots I see:

 - 680ms with OOP disabled
 - 659ms with OOP enabled

Unfortunately this does not translate to the numbers I'm testing. I don't know what does "show time to load" test exactly, but the numbers I measure for Settings app are:

1) With OOP disabled:
 - window.performance.now() at "onload": 1200ms
 - window.performance.now() at "localized": 6300ms
 - avg. resource load time (for 6 l10n files loaded via XHR): 500ms

2) With OOP enabled:
 - window.performance.now() at "onload": 600ms
 - window.performance.now() at "localized": 4000ms
 - avg. resource load time (for 6 l10n files loaded via XHR): 114ms

So it seems that OOP disabled helps with warm boots, but not so much with getting the event "localized" fired earlier. (that's the event we use to say "the document is ready and localized, please proceed with your code")

I also tested my custom build vs. one from releases thinking that maybe I'm on debug one, but there's no visible difference.
Yeah, unfortunately it seems that the "time to load" is useless here.

I removed everything from settings, leaving this: http://pastebin.mozilla.org/1943725 and still got the same time to load numbers (with or without OOP) so it seems that this is about things that are not relevant to the actual page loading or resource loading times.
I set up a baseline using an above stub index.html (no resources, nothing).

window.onload:
 - 680ms cold boot
 - 308ms warm boot

that means that on Unagi, the whole overload with all Settings CSS/JS/IMG resource loading that delays window.load is 1200 - 680 on cold boot and 600 - 308 on warm boot.
(In reply to Dave Hylands [:dhylands] from comment #23)
> Actually, if we pick something closer to the flash block size, you'll get
> much faster performance. On my otoro if I use a bs around 256K, I get around
> 10 Mb/sec.
> 
> I copied libxul.so to a different file and did a reboot. libxul.so will be
> partially in the cache if b2g is running.

Yeah, I tested it on Unagi and got even better numbers with bs=256k:

> cd /data/local/webapps/settings.gaiamobile.org
> cat /system/b2g/libxul.so > ./libxul.so
> echo 3 > /proc/sys/vm/drop_caches
> dd if=./libxul.so of=/dev/null bs=256k count=256000000
68080+1 records in
68080+1 records out
17428708 bytes transferred in 0.501 secs (34787840 bytes/sec)

That's over 34MB/s!!!
I created a minimal empty HTML document with a single onload function that dumps performance.timing into the document: http://pastebin.mozilla.org/1944129

The results:

NAME                                     desktop+localHTTP  emulator+local  Unagi+web  Unagi+local
performance.navigationStart              0                  0               0          0
performance.unloadEventStart             3                  0               0          0
performance.unloadEventEnd               3                  0               0          0
performance.fetchStart                   0                  0               1          121
performance.domainLookupStart            1                  0               1          121
performance.domainLookupEnd              1                  0               1          121
performance.connectStart                 0                  0               1          121
performance.connectEnd                   0                  0               1          121
performance.requestStart                 2                  0               1          121
performance.responseStart                2                  0               1          121
performance.responseEnd                  2                  0               1          121
performance.domLoading                   2                  1               221        469
performance.domInteractive               12                 16              334        983
performance.domContentLoadedEventStart   12                 25              332        1181
performance.domContentLoadedEventEnd     14                 25              333        1181
performance.domComplete                  14                 25              366        1189
performance.loadEventStart               14                 25              366        1189

NAME                                     UnaNoOOP+web       UnaNoOOP+local  
performance.navigationStart              0                  0               
performance.unloadEventStart             0                  11               
performance.unloadEventEnd               0                  11              
performance.fetchStart                   0                  11               
performance.domainLookupStart            0                  11              
performance.domainLookupEnd              0                  11              
performance.connectStart                 0                  11              
performance.connectEnd                   0                  11              
performance.requestStart                 0                  11              
performance.responseStart                0                  11              
performance.responseEnd                  0                  11              
performance.domLoading                   229                383              
performance.domInteractive               285                809              
performance.domContentLoadedEventStart   366                1182              
performance.domContentLoadedEventEnd     366                1182            
performance.domComplete                  400                1182              
performance.loadEventStart               400                1182              

The Unagi+web means that I loaded this code from inside of Gaia browser, Unagi+local is a local app loaded from homescreen.

Summary:
 - turning off OOP definitely does help with file loading time (at least main index.html, didn't measure following CSS/JS/L10n files yet)
 - there's a visible difference between domLoading for web vs. local. It may be that loading web browser makes DOM engine JITted, but if so, we should make it for apps as well.
 - NoOOP seems to be ~10% slower but shorter file load makes up for it

The DOM loading issue is a separate bug, but the NoOOP vs OOP fetchStart is probably what this bug is all about I believe.
oh, all the numbers for Unagi are for cold boot, an avg for 5 runs
I'm a bit lost.

The OOP clearly shows difference in performance API (so .html file loading), but when I added my custom XHR file to be loaded, the numbers are equally bad no matter if with or without OOP.

This is the code I'm testing against now: http://pastebin.mozilla.org/1944164

On my Unagi phone the numbers for the XHR load (cool boot) are as follows:

 - web+NoOOP: 215
 - local+NoOOP: 470
 - web+OOP: 186
 - local+OOP: 308

That still feels like a lot of time to load a single small file via XHR. (on desktop and emulator it's about 30ms)
The web:

On desktop 
 - FirefoxNightly: ~40ms
 - Chrome: ~20ms

on Android (SGS3)
 - Firefox Aurora: - 180ms
 - Chrome: ~140ms

So it seems that the web access via WiFi on the phone so both Android and Unagi show similar latency.
But local access is way slower for both NoOOP and OOP.
I separated bug 812441 for DOM slowness and bug 812643 for index.html

This bug remains about XHR load time
all my timings for Unagi+local were done against Settings app. Unfortunately, due to bug 812643 the Settings app is loaded at system boot which skews the numbers.

On a clear test webapp, loading local resources via XHR costs ~100ms and it can simultaneously load 10 files, so loading all ten takes just above 120ms.

If we think we can get this number down (loading one XHR resource from app:// takes 100ms) then we can keep this bug open, otherwise we can close it.
Keywords: perf
Whiteboard: [c= p= u= s=]
Priority: -- → P3
Whiteboard: [c= p= u= s=] → [c=progress p= u= s=]
See Also: → 820684
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: