Closed
Bug 832942
Opened 12 years ago
Closed 12 years ago
Searches to Google.com over SSL cause OOM error page on ARMv6 builds
Categories
(NSS :: Libraries, defect, P1)
Tracking
(firefox19- unaffected, firefox20+ fixed, firefox21+ fixed, firefox22+ verified)
VERIFIED
FIXED
3.15
People
(Reporter: mamozrk, Assigned: cviecco)
References
Details
(Keywords: regression, reproducible, Whiteboard: [ARMv6])
Attachments
(6 files, 2 obsolete files)
2.58 KB,
text/plain
|
Details | |
1.19 KB,
patch
|
Details | Diff | Splinter Review | |
1.17 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
1.74 KB,
patch
|
Details | Diff | Splinter Review | |
178.53 KB,
application/x-gzip
|
Details | |
1.17 KB,
patch
|
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
Searching from the URL bar (ie type words in bar, tap the "Go"/looking-glass icon --> get Google search results) results in a "Problem loading page" error:
Secure Connection Failed
An error occurred during a connection to www.google.com. security library: memory allocation failure. (Error code: sec_error_no_memory)
The URL turned out to be:
https://www.google.com/search?q=kittens&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial
I'm running Nightly. about: tells me this is version 21.0a1 (2013-01-20)
about:support tells me of the following "Important Modified Preferences"
browser.cache.disk.capacity 204800
browser.cache.disk.smart_size.first_run false
browser.cache.disk.smart_size.use_old_max false
browser.cache.disk.smart_size_cached_value 10240
browser.startup.homepage_override.mstone 21.0a1
dom.mozApps.used true
dom.w3c_touch_events.expose true
extensions.lastAppVersion 21.0a1
network.cookie.prefsMigrated true
privacy.donottrackheader.enabled true
I also have AdBlock Plus (v2.2.1) installed. Disabling this doesn't change things.
The phone is a ZTE Blade running Android 4.0.4 (ICS); it has an ARMv6 CPU, running at 600 MHz, and (according to the "About phone" in the settings) 421 MB of memory.
Browsing other https sites doesn't cause problems, neither does removing the "s" from the "https".
Using the stock ICS browser to search also works as expected (and is using https for searching in this way).
Comment 1•12 years ago
|
||
If you browse to about:config and search for “tls” and toggle “security.enable_tls” to false does this work for you?
What are the conditions for raising sec_error_no_memory?
Assignee: nobody → nobody
Component: General → Libraries
Product: Firefox for Android → NSS
Version: Firefox 21 → unspecified
Reporter | ||
Comment 2•12 years ago
|
||
(In reply to Aaron Train [:aaronmt] from comment #1)
> If you browse to about:config and search for “tls” and toggle
> “security.enable_tls” to false does this work for you?
Yes, toggling security.enable_tls to false makes this work. Is leaving this set at false OK for day-to-day browsing?
Comment 3•12 years ago
|
||
Not a good idea, I'm curious why you would be getting sec_error_no_memory though, hopefully someone tracking this can detail.
Comment 4•12 years ago
|
||
Does the error occur all the time, most of the time, sometimes, or rarely?
Reporter | ||
Comment 5•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #4)
> Does the error occur all the time, most of the time, sometimes, or rarely?
All the time. I've been running Nightly on this phone only since 2013-01-05 (or thereabouts) and I can't recall when I first noticed this. Probably some time last week; but of course I can't actually recall trying a search on the phone before then.
Is there any logging I can turn on?
Updated•12 years ago
|
Flags: needinfo?(bsmith)
I too can reproduce consistently using the armv6 Nightly build on a Samsung Galaxy Fit (GT-5670). Submitting the search using the built-in Google search engine from the addressbar shows the error
Comment 7•12 years ago
|
||
I can reproduce this, this is pretty terrible on Firefox for Android; you can't search.
status-firefox19:
--- → affected
status-firefox20:
--- → affected
status-firefox21:
--- → affected
status-firefox22:
--- → affected
tracking-firefox20:
--- → ?
tracking-firefox21:
--- → ?
tracking-firefox22:
--- → ?
Comment 9•12 years ago
|
||
Steps to reproduce
i) Install Nightly/Aurora/Beta
ii) Access the address bar and input a query such as "mozilla" and hit enter or the arrow
Severity: normal → major
Keywords: reproducible
Comment 10•12 years ago
|
||
The only GeckoConsole output
E/GeckoConsole( 1447): An error occurred during a connection to www.google.com:443.
E/GeckoConsole( 1447):
E/GeckoConsole( 1447): security library: memory allocation failure.
E/GeckoConsole( 1447):
E/GeckoConsole( 1447): (Error code: sec_error_no_memory)
E/GeckoConsole( 1447): An error occurred during a connection to www.google.com:443.
E/GeckoConsole( 1447):
E/GeckoConsole( 1447): security library: memory allocation failure.
E/GeckoConsole( 1447):
E/GeckoConsole( 1447): (Error code: sec_error_no_memory)
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Updated•12 years ago
|
Summary: searches to Google.com over SSL cause OOM error page → Searches to Google.com over SSL cause OOM error page on ARMv6 devices
Whiteboard: [ARMv6]
Updated•12 years ago
|
tracking-firefox19:
--- → ?
Comment 11•12 years ago
|
||
Aaron noted in email:
"It's intermittent which suggests that, I just happened to hit it a bunch of times this morning and saw the duplicate which reminded me about the original bug. For example, I just reinstalled 19 and now I'm not hitting it but did earlier; and 20/21/22."
Given that, we'll wontfix for FF19 unless we start getting an influx of negative feedback.
Josh - who on your team is in the best position (and has the time) to work with Mobile QA?
Comment 12•12 years ago
|
||
This seems to happen when my device is really low on memory obviously
MemTotal: 428364 kB
MemFree: 4000 kB
Worse on 21 and 22. Fennec is using so much memory on this device that a simple Google search can't be conducted in one tab after startup.
Comment 13•12 years ago
|
||
The symptoms are that when TLSv1 is enabled and the user does a search on https://google.com, we get sec_error_out_of_memory. When the user disables TLSv1 (security.enable_tls=false) or when the user users http://google.com instead, then the problem does not reproduce.
There is basically no difference between TLSv1 and SSLv3 in the PSM and NSS layers in terms of memory usage. The main difference between TLSv1 and SSLv3 is that we send TLS extensions with TLSv1 but not SSLv3. In particular, we send the NPN extension. This makes me think that the problem is related to SPDY. Did we do anything that would have increased the memory usage of SPDY recently? e.g. SPDY server push?
Does the problem reproduce when browsing to https://twitter.com? (Patrick: what are some other SPDY-enabled sites to check?)
Flags: needinfo?(bsmith) → needinfo?(mcmanus)
Comment 14•12 years ago
|
||
Please try doing the search on https://google.com again with pref network.http.spdy.enabled=false and report back the results.
Flags: needinfo?(aaron.train)
Comment 15•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #13)
> SPDY. Did we do anything that would have increased the memory usage of SPDY
> recently? e.g. SPDY server push?
>
no.. just small bug fixes.
> Does the problem reproduce when browsing to https://twitter.com? (Patrick:
> what are some other SPDY-enabled sites to check?)
https://facebook.com
https://cloudflare.com
https://www.webtide.com/
https://www.optimizely.com/
If this were a more general spdy memory thing it is kinda weird that the error is consistently coming out of nss instead of spread around. I wonder if google.com is taking spdy as a signal to do something more aggressive than usual at the ssl layer (we know they take that approach on the chrome side.)
can we get a stack and is it consistent?
Flags: needinfo?(mcmanus)
Comment 16•12 years ago
|
||
With network.http.spdy.enabled=false, I still get the same results (sec_error_no_memory). I am using Nightly (02/27) on my HTC Status (Android 2.3.4).
(In reply to Brian Smith (:bsmith) from comment #13)
> Does the problem reproduce when browsing to https://twitter.com?
Twitter search works for me; both with spdy and without.
Flags: needinfo?(aaron.train)
Updated•12 years ago
|
Assignee: joshmoz → bsmith
Reporter | ||
Comment 17•12 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #15)
> > Does the problem reproduce when browsing to https://twitter.com? (Patrick:
> > what are some other SPDY-enabled sites to check?)
twitter WFM
> https://facebook.com
this redirects to a non-https page
> https://cloudflare.com
this brings the same issue as google
> https://www.webtide.com/
> https://www.optimizely.com/
these two WFM.
All these with the latest Nightly: 22.0a1 (2013-02-27)
> can we get a stack and is it consistent?
I could try for a stack if I was told what to do. Special build? Extra app? Hook up to a PC with ADB?
Comment 18•12 years ago
|
||
The mobile team implemented custom fonts which increased the baseline memory usage in Fx21 bug 844669.
Comment 19•12 years ago
|
||
androiduser in #mobile mentions that 2013-01-03-09-07 works fine and that 2013-01-03-03-09-46 is bad
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a812ef63de87&tochange=6955309291ee
Bug 799267?
Comment 20•12 years ago
|
||
b2897468bd82 Bug 825453 - Bump ARMv6 mozconfigs to use NDK r8c and GCC 4.6
(Kartikaya Gupta, r=blassey,ted)
da2b24b1fc7c Bug 799267 - AuthCertificate Telemetry (measuring first auth,
pkix and classic) (cviecco, r=bsmith)
5c8e41454810 Bug 825151 - Bump ARMv7 mozconfigs to use NDK r8c and GCC 4.6
(Kartikaya Gupta, r=blassey,ted)
Also possibly related:
https://bugzilla.mozilla.org/show_bug.cgi?id=772144
Keywords: regression
Comment 21•12 years ago
|
||
Camilo, can you see if this is a regression caused by your patch?
Assignee: bsmith → cviecco
Assignee | ||
Comment 22•12 years ago
|
||
Mark (reporter)
As I do not have an arm6 device available. Would you mind installing the version located here: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cviecco@mozilla.com-689d47ffe068/try-android-armv6/ and let me know if this solves the issue
Comment 23•12 years ago
|
||
I installed the try build on my GT-5670 but I still get the error.
Reporter | ||
Comment 24•12 years ago
|
||
(In reply to Camilo Viecco (:cviecco) from comment #22)
> Mark (reporter)
>
> As I do not have an arm6 device available. Would you mind installing the
> version located here:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cviecco@mozilla.
> com-689d47ffe068/try-android-armv6/ and let me know if this solves the issue
as with John, that build didn't fix the issue at either google or cloudflare.
Assignee | ||
Comment 25•12 years ago
|
||
John, Mark thank you
Can you try this one?
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cviecco@mozilla.com-2945b87a34ac/try-android-armv6/
Comment 26•12 years ago
|
||
I got the error again.
Is there any way to provide more information for this bug? Do debug builds spit out more information in logcat?
Comment 27•12 years ago
|
||
I'm still able to reproduce this on Firefox for Android 20 Beta2 build 2 on LG Slider (Android 2.3.4)
Comment 28•12 years ago
|
||
We are now into our third beta for Firefox 20; we need a fix here.
Comment 29•12 years ago
|
||
Assignee | ||
Comment 30•12 years ago
|
||
It seems like it is something to to with the build system. I have not been able to rebuld for Jan 1/2/3 but a rebuild on Jan 31 version (ndk r5c instead of r8c does not show the problem).
Comment 31•12 years ago
|
||
Try push with the old NDK to confirm: https://tbpl.mozilla.org/?tree=Try&rev=e81dde77bd40
Assignee | ||
Comment 32•12 years ago
|
||
I am pretty sure that this related to the build system.
> hg update -d 'Jan 31'
> cat .mozconfig
# Add the correct paths here:
#ac_add_options --with-android-ndk="$HOME/android/android-ndk-r8c"
#ac_add_options --with-android-sdk="$HOME/android/android-sdk-linux/platforms/android-16"
#ac_add_options --with-android-version=8
ac_add_options --with-android-ndk="/opt/android-ndk-r5c"
ac_add_options --with-android-sdk="/opt/android-sdk-linux/platforms/android-16"
ac_add_options --with-android-version=5
# android options
ac_add_options --enable-application=mobile/android
ac_add_options --target=arm-linux-androideabi
ac_add_options --with-ccache
mk_add_options MOZ_OBJDIR=./objdir-droid
mk_add_options MOZ_MAKE_FLAGS="-j14 -s"
ac_add_options --with-arch=armv6
export MOZ_PKG_SPECIAL=armv6
ac_add_options --disable-ion
> make -f client.mk build_and_deploy
and it works. With the version 8 ndk I the bug reappears.
Comment 33•12 years ago
|
||
If you are able to reproduce the problem, can you try to attach gdb and debug it? We can revert the NDK change if it comes to that but I'd still like to re-enable it eventually and we need to figure out what's going wrong here in order to do that. I don't know where the sec_error_no_memory error comes from and how it propagates so it would be easier for somebody who knows that code to investigate.
Comment 34•12 years ago
|
||
I discussed this a bit with cviecco on IRC, and he agreed to look into it a bit further, to see where the sec_error_no_memory error is coming from.
With no knowledge of the relevant code, I assume this error code is a result of a failed malloc and I would hypothesize that some latent bug in our code is being tickled by the NDK change such that we're passing a bad value to the malloc. Usually android kills fennec directly if we're *actually* eating up all the memory, and clearly we still have enough memory to render the error page so I suspect it's more likely that we're just making some sort of abnormal allocation request that's failing.
If we are unable to make much headway on this in the next few days I can revert back to NDK r5c on beta while we continue investigation.
Comment 35•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #31)
> Try push with the old NDK to confirm:
> https://tbpl.mozilla.org/?tree=Try&rev=e81dde77bd40
Confirming this works on the HTC Status (NDK R5C)
Comment 36•12 years ago
|
||
Should have tried this earlier, you can use the ARMv6 builds on an ARMv7 device too; also able to reproduce on the Nexus 4.
Summary: Searches to Google.com over SSL cause OOM error page on ARMv6 devices → Searches to Google.com over SSL cause OOM error page on ARMv6 builds
Updated•12 years ago
|
Assignee | ||
Comment 37•12 years ago
|
||
definetely a linker problem:
relevant adb locat lines:
v5 (bug not shown)
03-05 18:38:25.422 E/GeckoLibLoad( 9686): Load nss start
03-05 18:38:25.892 E/GeckoLinker( 9686): /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: relocation to NULL @0x0002c6b4
03-05 18:38:25.912 E/GeckoLinker( 9686): /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: relocation to NULL @0x0002c45c for symbol "__cxa_begin_cleanup"
03-05 18:38:25.912 E/GeckoLinker( 9686): /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: relocation to NULL @0x0002c578 for symbol "__cxa_type_match"
03-05 18:38:25.932 E/GeckoLibLoad( 9686): Load nss done
v8 (with problem, notice libnss now appears with a warning).
03-05 18:04:58.922 E/GeckoLibLoad( 9142): Load nss start
03-05 18:04:59.102 E/GeckoLinker( 9142): /data/app/org.mozilla.fennec_cviecco-1.apk!/libnss3.so: Warning: unhandled flags #8 not handled
03-05 18:04:59.132 E/GeckoLinker( 9142): /data/app/org.mozilla.fennec_cviecco-1.apk!/libnssutil3.so: Warning: unhandled flags #8 not handled
03-05 18:04:59.142 E/GeckoLinker( 9142): /data/app/org.mozilla.fennec_cviecco-1.apk!/libplc4.so: Warning: unhandled flags #8 not handled
03-05 18:04:59.152 E/GeckoLinker( 9142): /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: unhandled flags #8 not handled
03-05 18:04:59.172 E/GeckoLinker( 9142): /data/app/org.mozilla.fennec_cviecco-1.apk!/libplds4.so: Warning: unhandled flags #8 not handled
03-05 18:04:59.182 E/GeckoLibLoad( 9142): Load nss done
Assignee | ||
Updated•12 years ago
|
Assignee: cviecco → bugmail.mozilla
Comment 38•12 years ago
|
||
CC'ing some others who might have some NDK5 vs NDK8 linker knowledge
Comment 39•12 years ago
|
||
(In reply to Camilo Viecco (:cviecco) from comment #37)
> definetely a linker problem:
>
> relevant adb locat lines:
> v5 (bug not shown)
> 03-05 18:38:25.422 E/GeckoLibLoad( 9686): Load nss start
> 03-05 18:38:25.892 E/GeckoLinker( 9686):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: relocation
> to NULL @0x0002c6b4
> 03-05 18:38:25.912 E/GeckoLinker( 9686):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: relocation
> to NULL @0x0002c45c for symbol "__cxa_begin_cleanup"
> 03-05 18:38:25.912 E/GeckoLinker( 9686):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: relocation
> to NULL @0x0002c578 for symbol "__cxa_type_match"
> 03-05 18:38:25.932 E/GeckoLibLoad( 9686): Load nss done
>
>
> v8 (with problem, notice libnss now appears with a warning).
> 03-05 18:04:58.922 E/GeckoLibLoad( 9142): Load nss start
> 03-05 18:04:59.102 E/GeckoLinker( 9142):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libnss3.so: Warning: unhandled
> flags #8 not handled
> 03-05 18:04:59.132 E/GeckoLinker( 9142):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libnssutil3.so: Warning:
> unhandled flags #8 not handled
> 03-05 18:04:59.142 E/GeckoLinker( 9142):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libplc4.so: Warning: unhandled
> flags #8 not handled
> 03-05 18:04:59.152 E/GeckoLinker( 9142):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libnspr4.so: Warning: unhandled
> flags #8 not handled
> 03-05 18:04:59.172 E/GeckoLinker( 9142):
> /data/app/org.mozilla.fennec_cviecco-1.apk!/libplds4.so: Warning: unhandled
> flags #8 not handled
> 03-05 18:04:59.182 E/GeckoLibLoad( 9142): Load nss done
None of these should be a problem.
Comment 40•12 years ago
|
||
I investigated this some more on an ARMv6 debug build with gdb, and the problem seems to be happening in the call to PORT_SetError(0) in pk11slot.c [1]. This call goes to secport.c:151 [2] which calls PR_SetError(0, 0) [3]. This is where things go wrong, because when stepping through with the debugger the error code coming into this method is 0, but ends up getting set to -8173. I've attached some of my gdb trace, with disassembly. Coming into the method, errorCode is stored in $r0 and copied to $r4. At address 0x4a9d380c after the call to PR_GetCurrentThread, $r4 is still correct (value is 0), but then as soon as it executes the "mov r3, #0" instruction $r4 ends up at 4294959123 (equivalent to -8173). As far as I can tell this makes no sense, particularly since the problem doesn't happen on *every* call to PR_SetError, only some of them.
[1] http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/pk11wrap/pk11slot.c#2059
[2] http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/util/secport.c#151
[3] http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prerror.c#23
Comment 41•12 years ago
|
||
Comment 42•12 years ago
|
||
Actually after a few more runs it looks like the $r0 is bad even going into the PR_SetError method. I'm not sure why it showed up differently in the above trace.
The call from secport.c:151 to PR_SetError goes through some sort of thunk layer:
(gdb) x/i $pc
=> 0x4a8a1260 <PORT_SetError_Util+16>: b 0x4a8964b4
(gdb) si
0x4a8964b4 in ?? () from /mozilla-git/obj-android-armv6/dist/bin/libnssutil3.so
(gdb) x/3i $pc
=> 0x4a8964b4: add r12, pc, #0, 12
0x4a8964b8: add r12, r12, #135168 ; 0x21000
0x4a8964bc: ldr pc, [r12, #2532]! ; 0x9e4
And $r0 changes from 0 to 4294959123 when the "ldr" instruction in the thunk layer is executed:
(gdb) si
0x4a8964b8 in ?? () from /mozilla-git/obj-android-armv6/dist/bin/libnssutil3.so
(gdb) si
0x4a8964bc in ?? () from /mozilla-git/obj-android-armv6/dist/bin/libnssutil3.so
(gdb) p $r12
$99 = 1250653372
(gdb) x/w $r12+2532
0x4a8b7ea0 <_GLOBAL_OFFSET_TABLE_+56>: 0x4a9d37fc
(gdb) p $r0
$100 = 0
(gdb) si
Breakpoint 7, PR_SetError (code=-8173, osErr=0) at /mozilla-git/nsprpub/pr/src/misc/prerror.c:24
24 {
(gdb) p $r0
$101 = 4294959123
(gdb) x/i $pc
=> 0x4a9d37fc <PR_SetError>: push {r3, r4, r5, lr}
Comment 43•12 years ago
|
||
After much debugging with glandium, we (well, he) figured out what was going on. GCC is miscompiling the sftk_mkPrivKey function in security/nss/lib/softoken/pkcs11.c file. Specifically, after the call to DER_SetUInteger in that function [1], the "bad" compilation compares the return value to an unused register r10, and then returns the CKR_HOST_MEMORY return value since the comparison fails. Here is a disassembly of the relevant code from NDK r5c:
ead8: ebffd023 bl 2b6c <NSS_3.4+0x2b6c>
eadc: e3500000 cmp r0, #0 ; 0x0
eae0: 13a00002 movne r0, #2 ; 0x2
and the bad code from NDK r8c:
0x4b06d0d0 <+604>: bl 0x4b062ce8
0x4b06d0d4 <+608>: mov r3, #0
0x4b06d0d8 <+612>: cmp r10, r3
0x4b06d0dc <+616>: movne r0, #2
0x4b06d0e0 <+620>: moveq r0, r3
[1] http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/softoken/pkcs11.c#1895
Comment 44•12 years ago
|
||
We came up with this patch which seems to do enough code rearrangement to make the compiler generate different code. Try build in progress at https://tbpl.mozilla.org/?tree=Try&rev=af9109e8baef
Assignee | ||
Comment 45•12 years ago
|
||
Awesome work!
Testing locally a different version of the patch (little less disrupting of the control flow).
Assignee | ||
Comment 46•12 years ago
|
||
push of an alternative patch (works locally)
https://tbpl.mozilla.org/?tree=Try&rev=94d253182343
Comment 47•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #43)
> After much debugging with glandium, we (well, he) figured out what was going
:kats, :glandium, :cviecco - that's pretty sweet stuff.
Comment 48•12 years ago
|
||
The trybuild shows no OOM messages when searching Google from the address-bar.
Comment 49•12 years ago
|
||
Wan-Teh, Ryan: I am pretty sure you use at least parts of NSS in your Android Chromium builds. If so, this bug may affect you.
I think the next step is to try to reproduce this using NSS as built using NSS's build system, to see if this is a bug in NSS or a bug in Mozilla's overrides of NSS's build system.
We should also see if the patch in bug 838989 resolves the issue or not, because IIRC that patch deals with ARMv6 compilation stuff.
Also, we should see if DER_SetUInteger is perhaps triggering undefined behavior due to some C language spec. violation. In particular, aliasing violations are a high probability.
Failing all of that, we should accept this patch. But, I do not like taking this patch when we have no idea why it works when the original code doesn't, given that it is supposed to be logically equivalent to the current code.
Comment 50•12 years ago
|
||
Chrome does not use NSS on Android at all.
Assignee | ||
Comment 51•12 years ago
|
||
Comment 52•12 years ago
|
||
Also, what was the motivation for upgrading to NDK 8? I am worried that if we have a compiler bug that affects *this* code, the same bug will affect other code as well. Do we have any way of being confident one way or the other?
Comment 53•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #49)
> We should also see if the patch in bug 838989 resolves the issue or not,
> because IIRC that patch deals with ARMv6 compilation stuff.
Even if that patch fixes the problem, that patch looks too complicated to uplift to beta (I could be wrong though). I would be more comfortable with landing the hack and uplifting it to beta, and then possibly reverting the hack and landing that patch instead on central, if that resolves the issue.
> Also, we should see if DER_SetUInteger is perhaps triggering undefined
> behavior due to some C language spec. violation. In particular, aliasing
> violations are a high probability.
It looked ok to me, but feel free to take a look.
>
> Failing all of that, we should accept this patch. But, I do not like taking
> this patch when we have no idea why it works when the original code doesn't,
> given that it is supposed to be logically equivalent to the current code.
FWIW I compiled that file with gcc-4.7 from NDK r8d using the same command line options and environment, and the assembly in the object file for that came out fine:
240: ebfffffe bl 0 <DER_SetUInteger>
240: R_ARM_CALL DER_SetUInteger
244: e3a03000 mov r3, #0
248: e1500003 cmp r0, r3
24c: 13a00002 movne r0, #2
250: 01a00003 moveq r0, r3
So AFAICT this is a bug in gcc-4.6 in the armv6 codegen only, and has been fixed in gcc-4.7.
(In reply to Brian Smith (:bsmith) from comment #52)
> Also, what was the motivation for upgrading to NDK 8?
Various and sundry, mostly smaller libraries and general perf improvements.
> I am worried that if
> we have a compiler bug that affects *this* code, the same bug will affect
> other code as well. Do we have any way of being confident one way or the
> other?
It might affect other code but I'm inclined to assume it doesn't until we find a case that proves otherwise. Alternatively we could track down the fix in the gcc source code, add a diagnostic message to gcc that gets printed out whenever the bug would have been triggered, and then compile fennec with this modified gcc and see how many times it gets hit.
Comment 54•12 years ago
|
||
:cviecco, I like your alt-patch better than my original patch, so reassigning this bug back to you. If you're happy with the patch please request review so we can get this in and uplifted as soon as possible.
Assignee: bugmail.mozilla → cviecco
Comment 55•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #53)
> It might affect other code but I'm inclined to assume it doesn't until we
> find a case that proves otherwise. Alternatively we could track down the fix
> in the gcc source code (...)
We should ask gcc developers.
Comment 56•12 years ago
|
||
Comment on attachment 721915 [details] [diff] [review]
alt-patch
>+ /* The following ifdef is completelly unnecesary, but needed for
>+ * android if using ndk8c and arm6 as it has a compiler bug
>+ */
It would be nice to update this comment with the compiler version when
we find out more from the GCC developers.
Attachment #721915 -
Flags: review+
Comment 57•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #49)
>
> Also, we should see if DER_SetUInteger is perhaps triggering undefined
> behavior due to some C language spec. violation. In particular, aliasing
> violations are a high probability.
I reviewed DER_SetUInteger and saw no problem. There is no pointer
aliasing.
Comment 58•12 years ago
|
||
A few additional data points:
- It's not limited to armv6. It happens with -march=armv7-a -marm
- The minimal flags I've reproduced it with an already preprocessed and slightly simplified source are -marm -Os.
- It doesn't happen at -O2 (default for plain NSS build, but softoken has ALLOW_OPT_CODE_SIZE=1)
Assignee | ||
Comment 61•12 years ago
|
||
Mike, thank you. Do the r+ patch works around the issue on arm7 too? If so we can probably land it today.
Comment 62•12 years ago
|
||
(In reply to Camilo Viecco (:cviecco) from comment #61)
> Mike, thank you. Do the r+ patch works around the issue on arm7 too? If so
> we can probably land it today.
The issue is present on any arm architecture as long as it's not built as thumb. So you could add a !defined(__thumb__)
Assignee | ||
Comment 63•12 years ago
|
||
latest push. Testing on all platforms, updated patch to include gcc bug id.
https://tbpl.mozilla.org/?tree=Try&rev=b18227edec50
Comment 64•12 years ago
|
||
As an update: I built a modified version of the gcc-4.6 toolchain in NDK r8c that includes the GCC patch that fixes the problem (as per [1]), and also added some output for when that code is exercised. I've attached the patch I used (I had to backport the actual patch to the gcc-4.6 code).
Then I built the latest mozilla-central with that modified toolchain and verified that the pkcs11.o file contains the proper assembly (and works fine on my phone). Unfortunately, the gcc code in question gets exercised on pretty much every file, and I don't know how to tell which hits would have tickled the bug.
[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56561#c2
Comment 65•12 years ago
|
||
needinfo'ing nathan as glandium suggested his experience with GCC might be able to help here. Basically, given the GCC patch in comment 64, I'd like to find out when the buggy codepath in GCC is exercised, so that we can find if other files in the mozilla codebase will be impacted.
For more backstory you can start reading at comment 43 (or comment 0, if you want everything).
Flags: needinfo?(nfroyd)
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Reporter | ||
Comment 66•12 years ago
|
||
(In reply to Camilo Viecco (:cviecco) from comment #63)
> latest push. Testing on all platforms, updated patch to include gcc bug id.
> https://tbpl.mozilla.org/?tree=Try&rev=b18227edec50
I installed the build at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cviecco@mozilla.com-b18227edec50/try-android-armv6/
and this fixes the issue. \o/
Comment 67•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #65)
> needinfo'ing nathan as glandium suggested his experience with GCC might be
> able to help here. Basically, given the GCC patch in comment 64, I'd like to
> find out when the buggy codepath in GCC is exercised, so that we can find if
> other files in the mozilla codebase will be impacted.
It sounds like you are asking "what's the problematic code pattern so that we can audit several million lines of code to find other cases which might be fixed?" That sounds crazy, so could you please clarify what you're asking for? I don't see what the value of looking at other files in m-c is, since we already know this is a compiler bug.
Flags: needinfo?(nfroyd) → needinfo?(bugmail.mozilla)
Comment 68•12 years ago
|
||
Our "fix" is to juggle the code around to avoid the compiler bug. If we switched compilers to one without the bug, then yes, I would agree with you that there's no point in looking at other files in m-c. If we stick with the buggy compiler though then it is useful to know how much of our code will be impacted.
Flags: needinfo?(bugmail.mozilla)
Comment 69•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #68)
> Our "fix" is to juggle the code around to avoid the compiler bug. If we
> switched compilers to one without the bug, then yes, I would agree with you
> that there's no point in looking at other files in m-c. If we stick with the
> buggy compiler though then it is useful to know how much of our code will be
> impacted.
Ah, thanks for the clarification. Sticking with the buggy compiler still sounds like crazy talk, though. ;)
I don't see a usefully specific code pattern here; the best I can offer in lieu of sitting down with the compiler is "comparisons to zero" which unhelpfully translates to "everywhere".
One suggestion for guesstimating how much of a fix this provides:
- Build the source tree with extra options '-save-temps -fno-schedule-insns -fno-schedule-insns2' with an unmodified (buggy) and modified (fixed) compiler. -save-temps will produce .s files next to the .o files. -fno-schedule-insns* prevents the compiler from rearranging the instruction stream too much.
- Diff .s files at your leisure; the assembly files ought to differ only in places where bugs have been fixed. Ideally there will be no false positives from instruction rearrangement or differing register allocations.
Comment 70•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #69)
> - Diff .s files at your leisure; the assembly files ought to differ only in
> places where bugs have been fixed. Ideally there will be no false positives
> from instruction rearrangement or differing register allocations.
Isn't there chances of label numbering changes?
Comment 71•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #70)
> (In reply to Nathan Froyd (:froydnj) from comment #69)
> > - Diff .s files at your leisure; the assembly files ought to differ only in
> > places where bugs have been fixed. Ideally there will be no false positives
> > from instruction rearrangement or differing register allocations.
>
> Isn't there chances of label numbering changes?
Sure, it's not a foolproof method. But usually diffs in label numbers are harmless or they're surrounded by larger chunks of significant code changes.
Using the -y option to diff can be helpful for looking at assembly dumps too.
It's also worth noting that just because you (kats) were seeing those debug messages fire a lot doesn't necessarily mean that differences in assembly were always produced. combine.c is trying differing ways of sticking instructions together and you were likely seeing backtracking of unsuccessful combinations, rather than combinations that always affected final code generation.
Comment 72•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #71)
> It's also worth noting that just because you (kats) were seeing those debug
> messages fire a lot doesn't necessarily mean that differences in assembly
> were always produced. combine.c is trying differing ways of sticking
> instructions together and you were likely seeing backtracking of
> unsuccessful combinations, rather than combinations that always affected
> final code generation.
Yeah, I figured it might be something like that when I was looking through the code. :(
(In reply to Nathan Froyd (:froydnj) from comment #69)
> One suggestion for guesstimating how much of a fix this provides:
>
> - Build the source tree with extra options '-save-temps -fno-schedule-insns
> -fno-schedule-insns2' with an unmodified (buggy) and modified (fixed)
> compiler. -save-temps will produce .s files next to the .o files.
> -fno-schedule-insns* prevents the compiler from rearranging the instruction
> stream too much.
> - Diff .s files at your leisure; the assembly files ought to differ only in
> places where bugs have been fixed. Ideally there will be no false positives
> from instruction rearrangement or differing register allocations.
I'll try this. Thanks for the suggestion!
Assignee | ||
Comment 73•12 years ago
|
||
latest push has unrelated xpcshell test failures. Running tests locally before asking for moreinfo (uplift to nss).
Comment 74•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #72)
> I'll try this. Thanks for the suggestion!
So I did that, and found that only two files had changed (after removing spurious changes such as different objdir folder). One is pkcs11.s and the other is mpmontg.s. The changes to mpmontg.s look kind of worrisome because a large part of the file is just gone. I'm attaching the before/after.
Also as a sanity check I went through all the .o files generated in the objdir and ensured that I had the .s files that corresponded to each of them (i.e. to make sure I didn't miss any potential differences). Files compiled with the host-compiler showed up on the list, as did files that were generated from .lo scripts or .S files in the source tree. After removing all of those, there were still a few files that I was missing the .s files for and couldn't figure out where they were being generated from, so those files could potentially also be affected. I've included a list of those in the attachment as well.
Comment 75•12 years ago
|
||
The compiler-original output for mpmontg is cut in the middle of a function, that seems implausible.
Comment 76•12 years ago
|
||
So it is. I didn't notice the .s file was exactly 4096 bytes long either. It must not have gotten rebuilt properly when I ctrl-c'd an earlier build. I rebuilt that file just now and verified that there is no difference with the patched and unpatched compiler. I also figured out where the unknown .o files were coming from (mostly I was just looking for the .s in the wrong directory) and verified those are also unchanged. Looks like it's just pkcs11.c that is affected by this compiler bug.
Comment 77•12 years ago
|
||
Looks like we're good to go with that workaround then :)
Comment 78•12 years ago
|
||
I sent email to nss-dev to give people a heads-up about this landing in mozilla-beta and other branches as a private patch, and I will do the landing later today if I don't hear anything back from the module owners.
Comment 79•12 years ago
|
||
Comment on attachment 721915 [details] [diff] [review]
alt-patch
Review of attachment 721915 [details] [diff] [review]:
-----------------------------------------------------------------
::: security/nss/lib/softoken/pkcs11.c
@@ +1901,5 @@
> + */
> +#if defined (__arm__) && defined (__GNUC__)
> + *crvp = CKR_HOST_MEMORY;
> +#endif
> + break;
You should check in this patch in the NSS Hg repository as well.
Is this extra break statement necessary to work around the compiler bug?
Assignee | ||
Comment 80•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #79)
> Comment on attachment 721915 [details] [diff] [review]
> alt-patch
>
> Review of attachment 721915 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: security/nss/lib/softoken/pkcs11.c
> @@ +1901,5 @@
> > + */
> > +#if defined (__arm__) && defined (__GNUC__)
> > + *crvp = CKR_HOST_MEMORY;
> > +#endif
> > + break;
>
> You should check in this patch in the NSS Hg repository as well.
>
> Is this extra break statement necessary to work around the compiler bug?
Yes, I have also tested with the break inside the ifdef and that works well too.
Comment 81•12 years ago
|
||
If you land this in NSS hg, you should change the comment to say it's a gcc 4.6 problem when targetting arm (but not thumb).
Assignee | ||
Comment 82•12 years ago
|
||
waiting for upstream nss landing
Comment 83•12 years ago
|
||
I moved the break statement into the ifdef block.
I added the gcc version info as Mike suggested and
fixed the indentation in Camilo's patch. Thanks.
Attachment #725057 -
Flags: superreview?(cviecco)
Attachment #725057 -
Flags: review?(mh+mozilla)
Comment 84•12 years ago
|
||
Comment on attachment 725057 [details] [diff] [review]
Patch for the NSS hg repository, by Camilo Viecco
I don't think you need to mention NDK. Especially since this also affects Linux arm distros. Also, as mentioned in comment 62, you may add a !defined(__thumb__)
Attachment #725057 -
Flags: review?(mh+mozilla) → review+
Comment 85•12 years ago
|
||
Mike: I made the changes you suggested. Is this what you have
in mind? Thanks.
Attachment #725057 -
Attachment is obsolete: true
Attachment #725057 -
Flags: superreview?(cviecco)
Attachment #725060 -
Flags: superreview?(cviecco)
Attachment #725060 -
Flags: review?(mh+mozilla)
Comment 86•12 years ago
|
||
Comment on attachment 725060 [details] [diff] [review]
Patch for the NSS hg repository, v2, by Camilo Viecco
Review of attachment 725060 [details] [diff] [review]:
-----------------------------------------------------------------
::: lib/softoken/pkcs11.c
@@ +1896,5 @@
> NSSLOWKEY_EC_PRIVATE_KEY_VERSION);
> + if (rv != SECSuccess) {
> + crv = CKR_HOST_MEMORY;
> + /* The following ifdef is needed for Linux arm distros and
> + * Android if using ARMv6 as gcc 4.6 has a bug when targeting
s/if using ARMv6// ; you can even remove "Linux arm distros and Android" if you feel like it, because netbsd and other systems also supports arm, although they might not be using gcc 4.6.
Attachment #725060 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 87•12 years ago
|
||
you might want to add the gcc bug link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56561
also running latest patch on try:
https://tbpl.mozilla.org/?tree=Try&rev=67f1728e23e0
(eta 2hrs)
Comment 88•12 years ago
|
||
Removed "if using ARMv6" and added the gcc bug report URL.
Patch pushed to the NSS hg repository (NSS 3.15):
http://hg.mozilla.org/projects/nss/rev/a223a1cc6b5f
Attachment #725060 -
Attachment is obsolete: true
Attachment #725060 -
Flags: superreview?(cviecco)
Attachment #725119 -
Flags: checked-in+
Comment 89•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #88)
> Created attachment 725119 [details] [diff] [review]
> Patch for the NSS hg repository, v3, by Camilo Viecco
>
> Removed "if using ARMv6" and added the gcc bug report URL.
>
> Patch pushed to the NSS hg repository (NSS 3.15):
> http://hg.mozilla.org/projects/nss/rev/a223a1cc6b5f
Given this, can we get an landing to m-c/m-a/m-b? Our second to last FF20 beta is going to build tomorrow, so I think we can move forward with https://bugzilla.mozilla.org/show_bug.cgi?id=832942#c78
Flags: needinfo?(bsmith)
Comment 90•12 years ago
|
||
Yes, please use the upstream NSS patch (attachment 725119 [details] [diff] [review]) as the
private patch in mozilla-central.
Comment 91•12 years ago
|
||
Comment 92•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Priority: -- → P1
Target Milestone: --- → 3.15
Comment 93•12 years ago
|
||
Verified fixed on m-c as per original steps-to-reproduce
Tested via: Alcatel One-Touch 958 (Android 2.3, ARMv6), HTC Status (Android 2.3, ARMv6), Samsung Galaxy SII (Android 4.0.4, ARMv7 w/ARMv6 APK).
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•