Closed
Bug 658074
Opened 13 years ago
Closed 13 years ago
Android NDK5 perf regressions
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
FIXED
mozilla8
People
(Reporter: azakai, Assigned: jchen)
References
Details
(Keywords: mobile, perf, Whiteboard: [inbound])
Attachments
(5 files, 1 obsolete file)
In bug 617115 we are preparing to move mobile from the Android NDK4 to NDK5. The newer NDK is supposed to generate faster code, supposedly 20% or so. We are seeing the following: 1. Canvas benchmarks are 25% faster (c5bench.com) (which sort of confirms the NDK5 claims.) 2. JS benchmarks with JIT *disabled*, like SunSpider and V8, are 12-15% faster. 3. JS benchmarks with JIT *enabled*: SunSpider and V8 are about the same speed, Kraken is slightly slower. Specifically the 'imaging' tests in Kraken are 10-30% slower, other parts in Kraken seem about the same as before. So it looks like JIT stuff is slower while normal compiled code is faster. We are seeing this on a variety of Android devices, from an old 2.1 Galaxy S to a 2.3.3 HTC Desire. The JIT slowdown is most noticeable in Kraken's imaging tests, but I suspect we are also slower in V8, since I think we spend some time in GC there - which is the kind of code I'd expect to be speeded up. So I suspect a GC speedup is being masked by a JIT slowdown there. STR: Build Fennec as described in https://wiki.mozilla.org/Mobile/Fennec/Android#Developing_Fennec_for_Android Get both the NDK4 and 5 from http://developer.android.com/sdk/ndk/index.html For NDK4, use this mozconfig (changing paths accordingly) mk_add_options MOZ_OBJDIR=../android/fennec ac_add_options --with-android-sdk="/home/alon/Dev/android/android-sdk-linux_x86/platforms/android-8" ac_add_options --with-android-tools="/home/alon/Dev/android/android-sdk-linux_x86/tools" # Android NDK4 ac_add_options --with-android-ndk="/home/alon/Dev/android/android-ndk-r4c" ac_add_options --target=arm-android-eabi ac_add_options --enable-application=mobile ac_add_options --with-endian=little ac_add_options --disable-tests export MOZ_DEBUG_SYMBOLS=1 For NDK5 use this mozconfig mk_add_options MOZ_OBJDIR=../android/fennec ac_add_options --with-android-sdk="/home/alon/Dev/android/android-sdk-linux_x86/platforms/android-8" ac_add_options --with-android-tools="/home/alon/Dev/android/android-sdk-linux_x86/tools" # Android NDK5 ac_add_options --with-android-ndk="/home/alon/Dev/android/android-ndk-r5b" ac_add_options --with-android-version=5 ac_add_options --with-android-toolchain="/home/alon/Dev/android/android-ndk-r5b/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86" ac_add_options --with-android-platform="/home/alon/Dev/android/android-ndk-r5b/platforms/android-5/arch-arm" ac_add_options --target=arm-linux-androideabi ac_add_options --disable-elf-hack ac_add_options --enable-application=mobile ac_add_options --with-endian=little ac_add_options --disable-tests export MOZ_DEBUG_SYMBOLS=1 (until bug 617115 lands, also need the patches in that bug).
Comment 1•13 years ago
|
||
(In reply to comment #0) > Get both the NDK4 and 5 from http://developer.android.com/sdk/ndk/index.html Actually, you will need to get NDK r4 from here: http://ftp.mozilla.org/pub/mozilla.org/mobile/source/android-ndk-r4c-0moz3.tar.bz2 This is based on the official NDK r4 sources, but modified to include C++ standard library files and queue.h (bug 569760). The tarball linked above is prebuilt for Linux x86, and is the version used on Mozilla's build infrastructure.
Reporter | ||
Comment 2•13 years ago
|
||
Btw, I'd be happy to create builds for anyone that wants to investigate this issue. Feel free to ping me on irc.
Updated•13 years ago
|
Assignee: general → cdleary
Comment 3•13 years ago
|
||
the patches needed to build with ndk5 have landed on trunk except for breakpad changes, (which aren't needed for js shell).
Reporter | ||
Comment 4•13 years ago
|
||
The breakpad changes have landed as well, so everything needed is in m-c.
Comment 5•13 years ago
|
||
So, the generated JIT code is pretty much guaranteed not to change from NDK to NDK. The assembly that's generated will be the same before and after, so for each of these regressions it's either 1) related to the compiled code performance, which is more likely, or 2) related to the way that the new NDK handles OS-interfacing stuff, like marking allocated memory as executable and jumping to it. To be honest I have no idea how to profile the JS shell on android. For the prior JIT work I did I could inspect the generated ARM assembly, which is easier than identifying slow bits of unknown parts of JS' C++ runtime. Maybe we could delegate this to somebody who has more C++-on-android profiling experience? Taking the delta of the prior NDK's profile and the current one should point out all the C++ code that has actually gotten slower.
Reporter | ||
Comment 6•13 years ago
|
||
(In reply to comment #5) > So, the generated JIT code is pretty much guaranteed not to change from NDK > to NDK. Hmm, does the JIT optionally use things like NEON or Thumb? (Maybe detection of those features changed in the new NDK?) > To be honest I have no idea how to profile the JS shell on android. Not clear to me either. Last I heard, we didn't have valgrind or oprofile running on Android which is very sad. I heard stechz was using pcwalton's piranha profiler.
Comment 7•13 years ago
|
||
(In reply to comment #6) > (In reply to comment #5) > > So, the generated JIT code is pretty much guaranteed not to change from NDK > > to NDK. > > Hmm, does the JIT optionally use things like NEON or Thumb? (Maybe detection > of those features changed in the new NDK?) The JIT doesn't use Thumb or NEON. It does make use of VFP, and it can use some ARMv7 instructions. However, I would expect the impact of a failed detection to be rather different than what we see. For example, some of the maths routines would surely be affected by VFP as much as the imaging tests, and the ARMv7 instructions should benefit everything approximately equally. This doesn't rule it out as a possibility, of course. > > To be honest I have no idea how to profile the JS shell on android. > > Not clear to me either. Last I heard, we didn't have valgrind or oprofile > running on Android which is very sad. I heard stechz was using pcwalton's > piranha profiler. I've also been struggling to get a profiler to work on Android. ARM's Streamline profiler can do it (very effectively) but I had some unrelated trouble with the board I was using so got set back a bit. Streamline needs a kernel module, just like OProfile, so profiling on a real phone is not straightforward.
Comment 8•13 years ago
|
||
(In reply to comment #7) Jacob, do you feel comfortable taking assignment, since I don't have the means necessary to see this bug closed?
Comment 9•13 years ago
|
||
(In reply to comment #8) > (In reply to comment #7) > > Jacob, do you feel comfortable taking assignment, since I don't have the > means necessary to see this bug closed? Yep, I can take it, though if anyone knows anything potentially useful about the Android toolchains and libraries then I'd be quite happy to take advice! I notice that NDK5 includes a C++ library, so we no longer need the NDK modification that we had for NDK4. I wonder if the C++ stuff for NDK4 is faster, somehow. A profile run should show that up, anyway, but it's at least a potential culprit here.
Updated•13 years ago
|
Assignee: cdleary → Jacob.Bramley
Comment 10•13 years ago
|
||
Ok, after battling with my board to try to make it useful, I've got a rough profile run of Fennec built with NDK4 and NDK5, running the V8 and Kraken suites. The results aren't terribly enlightening because the interesting bits seem to be hidden behind areas without symbols. This is normally evidence of JavaScript code, but it's possible that I've just mis-loaded the symbols somehow. The only meaningful observation I've been able to make so far is that in NDK5 (running Kraken's imaging tests), we're spending about 25% of our time in org.mozilla.fennec_unofficial and about 65% of the time in plugin-container, but in NDK4 about 90% of the time is spent in plugin-container. Basically, something in the fennec_unofficial process is getting much slower with NDK5, but I'm not sure exactly what it is (or what each process is responsible for, even). Actually I was quite surprised to see plugin-container so high up the profile as there are no plugins on the page. Does everything but the browser chrome get run in the plugin-container? Something interesting was that in NDK4, most of the time is spent in plugin-container, but with occasional bursts of action in fennec_unofficial. These bursts occur perhaps every 20-30 seconds and take about 2 seconds. The spike occurs in libc.so and fennec_unofficial itself (though for some reason there's no other symbol information). It doesn't seem to have much overall impact. NDK5 is rather different: I see the same bursts, but they last much longer. The activity during these longer bursts is concentrated in the JavaScript engine. I see lots of js:: functions as well as a significant anonymous region, which I assume is JIT-compiled code. Another observation is that the test uses _approximately_ 288MB under NDK4, but just 235MB under NDK5. I thought it would be more useful to profile the JIT as a standalone shell application, but I get a weird "text busy" error when I try to run it. Investigation pointed to some unlikely-looking file system error, so I think something might be getting mixed up here.
Comment 11•13 years ago
|
||
plugin_container is misnamed for the electrolysis subprocess in Fennec (commonly called the "content" process), which is precisely where these benchmarks would run. Basically, your analysis makes perfect sense in that regard; it's precisely what has been reported here.
Comment 12•13 years ago
|
||
(In reply to comment #10) > Actually I was quite surprised to see plugin-container so high up the > profile as there are no plugins on the page. Does everything but the browser > chrome get run in the plugin-container? plugin_container _is_ the web content process. As Brian notes, the process is misnamed. > Another observation is that the test uses _approximately_ 288MB under NDK4, > but just 235MB under NDK5. This is great news. Less memory usage is great.
Comment 13•13 years ago
|
||
So it sounds like with NDK5 the chrome process is taking a lot more CPU... Is it possible to breakpoint during one of those CPU usage points and examine the JS stack to see exactly what JS is running in the chrome process there?
Comment 14•13 years ago
|
||
Here's another observation: the unresponsive script warning seems to pop up at the start of an activity burst in fennec_unofficial. The burst then gets interrupted, of course, but resumes again once the the warning is dismissed. Is there some kind of internal tick or event queue that is being serviced here? I'll try to get a debugger to stop in a useful spot, though it might be tricky.
Comment 15•13 years ago
|
||
Another example of the unresponsive script warning interrupting Kraken's imaging tests.
Comment 16•13 years ago
|
||
(In reply to comment #12) > > Another observation is that the test uses _approximately_ 288MB under NDK4, > > but just 235MB under NDK5. > > This is great news. Less memory usage is great. Take that with a massive handful of salt. The memory usage statistic is system-wide, and whilst I didn't change anything other than Fennec itself, there are too many variables for it to be a reliable measurement.
Comment 17•13 years ago
|
||
Reporter | ||
Comment 18•13 years ago
|
||
I think we expect to get unresponsive script warnings during benchmarks like kraken. Usually we need to set the script timeout (dom.max_script_run_time IIRC) in order to avoid that warning messing up the benchmark results. In general we would expect to see little or no activity in the parent process (fennec_unofficial) during a benchmark (that doesn't render anything). Whereas the child process (the misleadingly named plugin-container) should max out the CPU. So any activity in the parent is suspicious. If the script timeout warning is disabled (by setting dom.max_script_run_time to something very big), is there still any activity in the parent process? If so, that sounds very wrong. It seems hard to explain why changing NDK would affect this though. Random guess: Perhaps we have an event loop that sleeps or waits on a condition var, and somehow in the new NDK that turns into a busy wait?
Comment 19•13 years ago
|
||
(In reply to comment #18) > If the script timeout warning is disabled (by setting > dom.max_script_run_time to something very big), is there still any activity > in the parent process? If so, that sounds very wrong. Ah, I'd set it to 0 to disable it, but that had no effect. I don't know why I didn't think to set it really high. In any case, disabling the timeout warning just stops the interruptions. I still see the activity bursts in fennec_unofficial, but they're just not interrupted with periods of inactivity (whilst the warning was showing). > It seems hard to explain why changing NDK would affect this though. Random > guess: Perhaps we have an event loop that sleeps or waits on a condition > var, and somehow in the new NDK that turns into a busy wait? That sounds likely. A compiler change (with new optimizations) could introduce such behaviour.
Reporter | ||
Comment 20•13 years ago
|
||
Did some profiling investigation with dougt. Turns out there is some expected activity in the parent process: 1. Kraken loads each individual benchmark as an html document. So we get IPC messages to change the title from child to parent. 2. The benchmarks do document.write or innerhtml = to write out data. This uses a wyciwyg channel - which is remoted from the child to the parent. There is also caching on this channel, which also triggers IPC messages. However, none of these should take a substantial amount of processing, and they occur before or after the actual benchmarks run, so they should not impact the scores. So these cannot explain the speed regression we are seeing - it must be something else. Aside from this stuff, nothing else appears to be running in the parent.
Comment 21•13 years ago
|
||
This annotated profile shows that the extra bursts don't coincide with the JavaScript tests (perhaps indicating that there are two separate issues here). However, the pattern is quite different to what I saw before so something else must have changed along the way. I modified the benchmark to send and cancel the annotations (outside the timing loop), but it is otherwise unmodified.
Reporter | ||
Comment 22•13 years ago
|
||
Very interesting. If I read that correctly, then activity in the parent happens right as the benchmarks begin. Which makes sense, since we load an HTML document at that point which involves the parent (so we should see both child and parent activity at that point). In theory that shouldn't slow down benchmarks however, unless the benchmark timing includes the HTML parse time etc., in which case a slowdown in our processing in the parent could regress performance. jchen, once you get piranha or oprofile set up, can you please compare NDK4 and NDK5 builds here? That might help us understand which specific functions are slower. (Note that building NDK5 is tricky right now, see bug 663214. There is a workaroung there though.)
Comment 23•13 years ago
|
||
Hmm. I just double-checked, and I don't actually see much difference between NDK4 and NDK5 on the imaging test. Yes, I get the weird profile view, but the actual results vary very little. NDK4 got me about 40990ms for the imaging tests, and NDK5 got me about 41684ms. The strange regions in fennec_unofficial occur in libc.so. I don't have symbols for my system libraries so I don't know what it's doing, but I'll try to work out how it's getting there.
Assignee | ||
Comment 24•13 years ago
|
||
(In reply to comment #23) > The strange regions in fennec_unofficial occur in libc.so. I don't have > symbols for my system libraries so I don't know what it's doing, but I'll > try to work out how it's getting there. I've been profiling NDK4 and NDK5 builds running Kraken too. Here's a typical NDK4 profile: samples % image name app name symbol name 21591 30.5380 anon (tgid:2961 range:0x41a53000-0x41adb000) plugin-container anon (tgid:2961 range:0x41a53000-0x41adb000) 9808 13.8723 anon (tgid:2961 range:0x41adb000-0x41b11000) plugin-container anon (tgid:2961 range:0x41adb000-0x41b11000) 4411 6.2389 anon (tgid:2961 range:0x41a53000-0x41a91000) plugin-container anon (tgid:2961 range:0x41a53000-0x41a91000) 3868 5.4708 anon (tgid:2961 range:0x41adb000-0x41af0000) plugin-container anon (tgid:2961 range:0x41adb000-0x41af0000) 3731 5.2771 libm.so libm.so /system/lib/libm.so 2465 3.4865 anon (tgid:2961 range:0x42900000-0x42b00000) plugin-container anon (tgid:2961 range:0x42900000-0x42b00000) 1880 2.6590 libc.so libc.so /system/lib/libc.so 1564 2.2121 anon (tgid:2961 range:0x41a53000-0x41a90000) plugin-container anon (tgid:2961 range:0x41a53000-0x41a90000) 1324 1.8726 anon (tgid:2961 range:0x41a95000-0x41ad5000) plugin-container anon (tgid:2961 range:0x41a95000-0x41ad5000) 876 1.2390 anon (tgid:2961 range:0x41a53000-0x41ad6000) plugin-container anon (tgid:2961 range:0x41a53000-0x41ad6000) 835 1.1810 anon (tgid:2961 range:0x41b11000-0x41d00000) plugin-container anon (tgid:2961 range:0x41b11000-0x41d00000) Here's a typical NDK5 profile: samples % image name app name symbol name 14573 18.7673 anon (tgid:3184 range:0x41977000-0x419e4000) plugin-container anon (tgid:3184 range:0x41977000-0x419e4000) 12488 16.0822 libc.so libc.so /system/lib/libc.so 6505 8.3772 anon (tgid:3184 range:0x41977000-0x419e0000) plugin-container anon (tgid:3184 range:0x41977000-0x419e0000) 5476 7.0521 anon (tgid:3184 range:0x41977000-0x419e3000) plugin-container anon (tgid:3184 range:0x41977000-0x419e3000) 4405 5.6728 anon (tgid:3184 range:0x41977000-0x419c3000) plugin-container anon (tgid:3184 range:0x41977000-0x419c3000) 3703 4.7688 libm.so libm.so /system/lib/libm.so 3353 4.3180 anon (tgid:3184 range:0x42000000-0x42600000) plugin-container anon (tgid:3184 range:0x42000000-0x42600000) 2070 2.6658 anon (tgid:3184 range:0x419b7000-0x419e4000) plugin-container anon (tgid:3184 range:0x419b7000-0x419e4000) 1761 2.2678 anon (tgid:3184 range:0x42a00000-0x42c00000) plugin-container anon (tgid:3184 range:0x42a00000-0x42c00000) 1177 1.5158 anon (tgid:3184 range:0x41977000-0x419d8000) plugin-container anon (tgid:3184 range:0x41977000-0x419d8000) 1080 1.3908 anon (tgid:3184 range:0x41977000-0x419c5000) plugin-container anon (tgid:3184 range:0x41977000-0x419c5000) 890 1.1462 anon (tgid:3184 range:0x419e4000-0x41af4000) plugin-container anon (tgid:3184 range:0x419e4000-0x41af4000) 883 1.1371 libxul.so (deleted) libxul.so (deleted) js_DoubleToInt32(double) I think this agrees with what you've found. The 31% JIT task in NDk4 is split into 19% JIT and 16% libc in NDK5. I looked at the addresses and most of the time in libc is spent in memcpy. However, I haven't been able to get a backtrace of who is calling memcpy.
Reporter | ||
Comment 25•13 years ago
|
||
To summarize some more stuff we've discussed offline, 1. memcpy is being called to copy megabytes of ipc data, through a wciwyg channel. This happens in the image tests and not elsewhere. Which makes sense so far. This should take time in the parent, possibly also the child (not sure how we copy ipc data there). 2. It might be slower in NDK5 if it doesn't optimize native functions like memcpy etc., if NDK5 has -fno-builtin by default. Probably worth building without that and seeing the difference. Not having -fno-builting in NDK4 would explain why we don't see it in the traces.
Assignee | ||
Comment 26•13 years ago
|
||
Basically the regression is from std:string usage in the Chromium IPC code. IPC keeps a buffer of incoming messages, and IPC assigns any unprocessed messages to this buffer. However, if we didn't do any processing in the first place, the code will still try to assign the buffer to the buffer itself. My guess is our old STL guarded against assigning string to itself. However, the new STL happily allocated and copied another buffer as part of the "assignment", which led to excessive memcpying for no reason. The comprehensive Kraken results are below NDK4 (unpatched): 88845ms / 0.4% NDK4 (patched): 88808ms / 0.4% NDK5 (unpatched): 95180ms / 0.3% NDK5 (patched): 88326ms / 0.3% In addition to not assigning the buffer to itself, the patch also tries to optimize memory allocation of the buffer, but the effect is negligible.
Attachment #542978 -
Flags: review?(benjamin)
Reporter | ||
Comment 27•13 years ago
|
||
Comment on attachment 542978 [details] [diff] [review] Patch to optimize string usage in IPC Nice find, jchen! cjones, do you know if there is an upstream that can receive bugreports?
Attachment #542978 -
Flags: review?(benjamin) → review?(jones.chris.g)
Reporter | ||
Updated•13 years ago
|
Assignee: Jacob.Bramley → nchen
Component: JavaScript Engine → IPC
QA Contact: general → ipc
Summary: Android NDK5 JIT regressions → Android NDK5 perf regressions
Comment on attachment 542978 [details] [diff] [review] Patch to optimize string usage in IPC > // Process messages from input buffer. >- const char *p; >+ const char *p, *origp; Nit: to match surrounding style, please put this decl on a separate line. Please rename it |overflowp|. > const char *end; > if (input_overflow_buf_.empty()) { >+ origp = NULL; > p = input_buf_; > end = p + bytes_read; > } else { > if (input_overflow_buf_.size() > > static_cast<size_t>(kMaximumMessageSize - bytes_read)) { > input_overflow_buf_.clear(); > LOG(ERROR) << "IPC message is too big"; > return false; > } >+ size_t buf_size = input_overflow_buf_.size(); >+ if (buf_size + bytes_read > input_overflow_buf_.capacity()) { >+ buf_size += bytes_read > buf_size / 4 ? bytes_read : buf_size / 4; >+ input_overflow_buf_.reserve(buf_size); >+ } Why are you implementing a custom buffer-growth policy here? I think this may be what you described having a neglible effect. If so, please remove it. >@@ -566,17 +572,23 @@ bool Channel::ChannelImpl::ProcessIncomi > listener_->OnMessageReceived(m); > } > p = message_tail; > } else { > // Last message is partial. > break; > } > } >- input_overflow_buf_.assign(p, end - p); >+ if (end == p) { >+ input_overflow_buf_.clear(); >+ } else if (!origp) { // p is from input_buf_ >+ input_overflow_buf_.assign(p, end - p); >+ } else if (p > origp) { // p is from input_overflow_buf_ >+ input_overflow_buf_.erase(0, p - origp); Let's put these comments on their own lines too to match surrounding style. Nice fix! Let's figure out the custom buffer growth then I'm happy to take this.
Attachment #542978 -
Flags: review?(jones.chris.g)
(In reply to comment #27) > cjones, do you know if there is an upstream that can receive bugreports? Our code is a de facto fork, but I took a look at chromium trunk and this fix would apply. We could/should be good citizens and file an issue at http://code.google.com/p/chromium/issues. I can take care of this after this patch lands, if jchen is busy with other stuff.
Assignee | ||
Comment 30•13 years ago
|
||
Thanks for the comments! (In reply to comment #28) > Comment on attachment 542978 [details] [diff] [review] [review] > Patch to optimize string usage in IPC > > Nit: to match surrounding style, please put this decl on a > separate line. Please rename it |overflowp|. Done > Why are you implementing a custom buffer-growth policy here? I > think this may be what you described having a neglible effect. > If so, please remove it. Right this had no noticeable effect. Removed. > Let's put these comments on their own lines too to match > surrounding style. Done. It'd be great if you can follow this up with chromium. Thanks!
Attachment #542978 -
Attachment is obsolete: true
Attachment #544001 -
Flags: review?(jones.chris.g)
Comment on attachment 544001 [details] [diff] [review] Patch to optimize string usage in IPC (v1.1) Thanks.
Attachment #544001 -
Flags: review?(jones.chris.g) → review+
Were these builds using STLPort built into NDK5 or gcc's STL impl?
Assignee | ||
Comment 33•13 years ago
|
||
(In reply to comment #32) > Were these builds using STLPort built into NDK5 or gcc's STL impl? NDK5 STLPort. That's one thing that's changed from NDK4 to NDK5.
That sounds a little scary. Is there no way for us to keep using our own NDK with gcc's STL impl? Otherwise I'm afraid we might have to keep shaking these kinds of regressions out, and we have fewer perf tests for android than any of our other Tier I-ish platforms. I'm not against taking this patch even if we go back to gcc's STL, though.
Assignee | ||
Comment 36•13 years ago
|
||
(In reply to comment #34) > That sounds a little scary. Is there no way for us to keep using our own > NDK with gcc's STL impl? Otherwise I'm afraid we might have to keep shaking > these kinds of regressions out, and we have fewer perf tests for android > than any of our other Tier I-ish platforms. > > I'm not against taking this patch even if we go back to gcc's STL, though. I wonder in how many places are we using STL. Using STLPort and not requiring a custom NDK build means we can better keep up with new versions of NDK.
Keywords: checkin-needed
I understand the motivation for using the stock NDK5. All the performance-critical uses of STL are in IPC code, AFAIK. One way to get a rough estimate of before/after perf is to built the IPDL tests (--enable-ipdl-tests), and run the latency check ($dist/bin/ipdlunittest TestLatency) with NDK5/STLPort vs. NDK5/gcc (or, less scientifically, NDK4 vs NDK5/STLPort). That test ought to be bound by OS and gecko event-loop stuff, but STL badness might show up. If we can't measure a difference (not capable of measuring or no measured difference), then we don't have a reason to use gcc over STLPort.
Comment 38•13 years ago
|
||
Looks like bz landed this on m-i: http://hg.mozilla.org/integration/mozilla-inbound/rev/4836dc515ac2 [updating from "other branch"-->"Trunk" after verifying that's correct w/ jchen]
Comment 39•13 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/4836dc515ac2
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
Comment 40•13 years ago
|
||
For those of us following along at home: could you explain how an inefficiency in the ipc code was influencing our kraken score? Looking at the kraken harness, it seems like only pure JS is run in the timed interval. Is something happening in parallel that is stealing resources from the content process? Thanks, and good job solving the problem!
Assignee | ||
Comment 41•13 years ago
|
||
Thanks! My understanding so far is cache data is being sent across IPC in parallel to the benchmark running. Specifically, when Kraken uses document.write in a frame to start a test, document.write caches that content using WyciwygChannel. In Fennec, this means we send the cache data from child to parent asynchronously. This bug shows up when the parent is not able to receive all the cache data at once. Every time the parent reads an additional chunk of incoming data, it assigns the incoming buffer to itself resulting in unnecessary alloc and memcpy. This extra memcpy in the parent takes away CPU time from the child running the benchmark, making the benchmark slower. The effect becomes noticeable when a lot of data is being sent, which is the case when running the Kraken image tests (megabytes of image data). The parent receives a chunk of cache data at a time, and everytime, it allocates a new buffer and copies over everything it has received so far. This results in a cumulative memcpy of hundreds of megabytes, which is too much even for Android's super optimized memcpy :)
Comment 42•13 years ago
|
||
Thanks for the detailed explanation! Btw, was this measured on a single or dual core ARM platform? I'm curious because I thought we are using Tegras and I thought Tegras were dual-core and, if so, this is a very interesting data point concerning (I assume) memory bandwidth.
(In reply to comment #41) > Specifically, when Kraken uses document.write in a frame to start a test, > document.write caches that content using WyciwygChannel. In Fennec, this > means we send the cache data from child to parent asynchronously. That sounds, ah, highly undesirable :/. I suppose we cache this for view-source on desktop? Does fennec support view-source? I couldn't find the option. If not, boy would it be nice to not uselessly cache hundreds of megabytes of document.write().
Comment 44•13 years ago
|
||
Fennec has view source but no ui for it unless you install mfinkle's mobile tools add on
Comment 45•13 years ago
|
||
> I suppose we cache this for view-source on desktop?
No, we cache it so that reload will do the right thing. It needs to reload the data that was written with document.write, which means we need that data available....
Huh, ok. I guess that being the "right thing" is in the eye of the beholder, but that's a topic for elsewhere ;).
Comment 47•13 years ago
|
||
"right thing" in this case is determined by web compat... There are sites that call reload() and expect that behavior.
Yeah, figured.
Assignee | ||
Comment 49•13 years ago
|
||
(In reply to comment #42) > Thanks for the detailed explanation! > > Btw, was this measured on a single or dual core ARM platform? I'm curious > because I thought we are using Tegras and I thought Tegras were dual-core > and, if so, this is a very interesting data point concerning (I assume) > memory bandwidth. I did my measurements on a single-core phone, although I should do it on a dual-core too. Might uncover some other interesting stuff :)
Comment 50•13 years ago
|
||
I measured on a dual core atrix and saw similar results
You need to log in
before you can comment on or make changes to this bug.
Description
•