Closed
Bug 694325
Opened 14 years ago
Closed 13 years ago
Android DNS is single threaded
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla11
People
(Reporter: mcmanus, Assigned: sworkman)
References
Details
Crash Data
Attachments
(2 files, 5 obsolete files)
|
367.21 KB,
patch
|
mcmanus
:
review+
mwu
:
review+
|
Details | Diff | Splinter Review |
|
66.96 KB,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #687367 +++
As a consequence of the workaround in bug 687367 there is no concurrency in DNS resolution on android. Prefetch is disabled on that platform too, in order to take some pressure off the single threaded resolver.
This is a particularly bad problem in a high latency mobile environment.
m.k.edwards has a patch to restore concurrency here that needed to be backed out due to crashes. https://hg.mozilla.org/mozilla-central/rev/c3a50afc2243
| Reporter | ||
Updated•14 years ago
|
| Assignee | ||
Updated•14 years ago
|
Assignee: nobody → sjhworkman
| Assignee | ||
Comment 1•14 years ago
|
||
Bypass pre-Honeycomb (pre v3.0) implementation of getaddrinfo to provide thread-safety for relevant stdio calls:
- implement __wrap_getaddrinfo
- set symbols in configure.in so calls to getaddrinfo are translated to __wrap_getaddrinfo
- change _files_getaddrinfo to use mmap/open/gets/close rather than fopen/fgets/fclose
- pull in supporting files from bionic (gingerbread, v2.3) to avoid crashes (see bug 687367, bug 693103) on (HTC) devices (14 files in total; getaddrinfo.c + 5 source files, 8 headers) (Note: diff to changes to these files in next attachment).
Attachment #575070 -
Flags: review?(mwu)
Attachment #575070 -
Flags: review?(mcmanus)
| Assignee | ||
Comment 2•14 years ago
|
||
Diff detailing changes to files pulled in from Android v2.3 Bionic.
Clarification: main diff contains files with changes already applied. This diff is for easy review only.
Note: Most of these files are included to deal with the crash see in bug 693103. Differences between the libc version of "struct res_state" and the version in these files seemed to be causing the crash. It took all these files to isolate the use of this structure so that a single version of its layout is used in the callgraph.
Attachment #575071 -
Flags: review?(mwu)
Attachment #575071 -
Flags: review?(mcmanus)
| Assignee | ||
Comment 3•14 years ago
|
||
FYI, I've tested the patch on several Android devices as follows:
Samsung Infuse running 2.2.1
Samsung Galaxy S running 2.3.4
HTC Evo 4g running 2.3.3.
No crashes seen on these devices.
Comment 4•14 years ago
|
||
(In reply to Steve Workman [:sworkman] from comment #3)
> FYI, I've tested the patch on several Android devices as follows:
> HTC Evo 4g running 2.3.3.
Steve: I'm not sure if my phone should have been included as well. My phone is an HTC Sensation 4G running 2.3.4.
| Reporter | ||
Comment 5•14 years ago
|
||
Comment on attachment 575071 [details] [diff] [review]
v1.0 diff; changes to files pulled in
Review of attachment 575071 [details] [diff] [review]:
-----------------------------------------------------------------
There's nothing for me to make a r? judgment on here so I'm clearing the flag; this was just background information - and I really appreciated it!
Attachment #575071 -
Flags: review?(mcmanus)
| Reporter | ||
Comment 6•14 years ago
|
||
Comment on attachment 575070 [details] [diff] [review]
v1.0 diff
Review of attachment 575070 [details] [diff] [review]:
-----------------------------------------------------------------
Steve, I think this is great. I can only provide the r+ for the bits in netwerk that turn parallelism back on, but I've read the whole thing and I really think this is the right approach. I appreciate that you revert to system behavior when it is safe again with honeycomb. This will be a competitive advantage on android for us.
r=mcmanus for netwerk/*
Attachment #575070 -
Flags: review?(mcmanus) → review+
| Assignee | ||
Comment 7•14 years ago
|
||
(In reply to Jared Wein [:jwein and :jaws] from comment #4)
> (In reply to Steve Workman [:sworkman] from comment #3)
> > FYI, I've tested the patch on several Android devices as follows:
> > HTC Evo 4g running 2.3.3.
>
> Steve: I'm not sure if my phone should have been included as well. My phone
> is an HTC Sensation 4G running 2.3.4.
Jared, thanks for adding that. I didn't include your phone just because the final patch uploaded here was a little different from the one I used on your phone earlier in the day.
| Assignee | ||
Comment 8•14 years ago
|
||
Thanks for the r+ Patrick. I should clarify that I started with the last big patch in bug 687367, so the pre/post-honeycomb decision in the code was not something I added, but came from MK Edwards. In any case, yes, it's good behavior.
Comment 9•14 years ago
|
||
Comment on attachment 575071 [details] [diff] [review]
v1.0 diff; changes to files pulled in
>@@ -1361,18 +1504,16 @@ static int
> _get_label(const struct sockaddr *addr)
> {
> if (addr->sa_family == AF_INET) {
> return 3;
> } else if (addr->sa_family == AF_INET6) {
> const struct sockaddr_in6 *addr6 = (const struct sockaddr_in6 *)addr;
> if (IN6_IS_ADDR_LOOPBACK(&addr6->sin6_addr)) {
> return 0;
>- } else if (IN6_IS_ADDR_ULA(&addr6->sin6_addr)) {
>- return 1;
Why was this removed?
Overall this looks mostly right, though it seems strange that so many symbols had __wrap prefixed. Most of these symbols shouldn't be exported in the first place, so we shouldn't have to prefix them to avoid collision. Prefixing __wrap is also a bit weird when it's not actually a symbol that libxul is accessing through the linker wrapping/renaming.
Attachment #575071 -
Flags: review?(mwu)
Comment 10•14 years ago
|
||
Comment on attachment 575070 [details] [diff] [review]
v1.0 diff
Review of attachment 575070 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good overall. Just adjust configure.in and maybe remove/rename some of the __wrap prefixing if you can.
::: configure.in
@@ +7262,5 @@
> WRAP_LDFLAGS="${WRAP_LDFLAGS} -L$_objdir/dist/lib -lmozutils"
> if test "$MOZ_WIDGET_TOOLKIT" = android; then
> WRAP_LDFLAGS="${WRAP_LDFLAGS} -Wl,--wrap=dlopen,--wrap=dlclose,--wrap=dlerror,--wrap=dlsym,--wrap=dladdr"
> fi
> + WRAP_LDFLAGS="${WRAP_LDFLAGS} -Wl,--wrap=getaddrinfo,--wrap=freeaddrinfo,--wrap=gai_strerror"
Can you move this inside the android toolkit check? (lines 7263-7265)
Gonk (OS_TARGET=Android, MOZ_WIDGET_TOOLKIT=gonk) doesn't build mozutils from other-licenses/android.
Attachment #575070 -
Flags: review?(mwu) → review+
| Assignee | ||
Comment 11•14 years ago
|
||
Thanks for the comments, Michael.
Re comment 9, please see MK Edwards original comment on this in bug 687367 comment 28:
"That macro isn't defined by the older headers in the NDK we're using. It's a side effect of another patch that was already applied to the version of getaddrinfo.c I started from; we want the rest of that patch."
Re configure.in, no problem. I wasn't sure about that one myself, so I can fix that one easily.
Re __wrap prefixing, I guess my approach was to be on the cautious side and just prefix any non-static functions in the files brought in. With respect to those functions not accessed through linker-wrapping/renaming, again, I was staying on the cautious side of things - these functions may not be called directly from libxul, but they are part of the call graph from __wrap_getaddrinfo etc. Let me take another look over this and see what I can come up with.
| Assignee | ||
Comment 12•14 years ago
|
||
Carrying forward r+
Attachment #575070 -
Attachment is obsolete: true
Attachment #575386 -
Flags: review+
| Assignee | ||
Comment 13•14 years ago
|
||
Attachment #575071 -
Attachment is obsolete: true
Attachment #575387 -
Flags: review?(mwu)
Comment 14•14 years ago
|
||
Comment on attachment 575387 [details] [diff] [review]
v1.1 diff; changes to files pulled in; most __wrap prefixes removed
Looks good. r=me on the full patch.
Attachment #575387 -
Flags: review?(mwu)
| Assignee | ||
Comment 15•13 years ago
|
||
Try results:
https://tbpl.mozilla.org/?tree=Try&rev=9be31b71f8f4
All seems to be fine. (Note: I had issues on Friday getting try to run for the Android build; hence the delay in getting these results).
Re Performance, I checked tp4m and tp4m_nochrome: results seem to have landed within recently observed ranges.
| Assignee | ||
Comment 16•13 years ago
|
||
Removed try instructions so patch is ready for landing on inbound.
Attachment #575386 -
Attachment is obsolete: true
Attachment #576008 -
Flags: review+
| Assignee | ||
Updated•13 years ago
|
Keywords: checkin-needed
Comment 17•13 years ago
|
||
| Assignee | ||
Comment 18•13 years ago
|
||
Thanks, Doug.
Comment 19•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
Comment 20•13 years ago
|
||
We believe this patch is causing crashes on tablets:
E/AndroidRuntime(27160): java.lang.UnsatisfiedLinkError: Cannot load library: reloc_library[1315]: 87 cannot locate '_res_opcodes'...
| Assignee | ||
Comment 21•13 years ago
|
||
Discussed with DougT on IRC. Crash is happening on birch. So, mozilla-central is unaffected at the moment, but this will cause problems with the merge next week. So, let's back it out and get the def for the symbol added to the patch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla11 → ---
Comment 22•13 years ago
|
||
Comment 23•13 years ago
|
||
+extern const char * const _res_opcodes[];
+extern const char * const _res_sectioncodes[];
These variables are defined extern and never defined in our code, which means we rely on the system library containing them. This is dangerous, as subtle binary compatibility changes in system library can break our code (and surely, this is what happens on tablets)
Comment 24•13 years ago
|
||
Even simpler, the _res_opcodes symbol is not exported from libc.so in honeycomb.
# objdump -T libc.so.honeycomb | grep _res_opcode
# objdump -T libc.so.gingerbread | grep _res_opcode
00040838 g DO .data.rel.ro 00000040 _res_opcodes
| Assignee | ||
Comment 25•13 years ago
|
||
Pulled in more files to ensure symbols are either defined in libmozutils via these files, or explicitly exposed in the NDK headers.
Tested XUL and Birch builds on:
Samsung Gal Tab 10.1 (Android v3.2)
Asus Tab (v3.2.1)
HTC Evo 3D (v2.3.4 with Sense v3.0)
Samsung Gal S (v2.3.4)
AT&T (Huawei) Impulse (v2.2.2)
... to cover the original case, crash observed on HTC devices, and crash observed on Honeycomb tablets.
Attachment #576008 -
Attachment is obsolete: true
Attachment #578463 -
Flags: review?(mwu)
Attachment #578463 -
Flags: review?(mcmanus)
| Assignee | ||
Comment 26•13 years ago
|
||
Changes made to the files pulled in. Tried to #ifdef out functions that are not called in libmozutils, using "#define MOZILLA_NECKO_EXCLUDE_CODE".
Attachment #575387 -
Attachment is obsolete: true
| Reporter | ||
Comment 27•13 years ago
|
||
Comment on attachment 578463 [details] [diff] [review]
v1.3 diff; ensure undefined symbols exposed in Bionic
Review of attachment 578463 [details] [diff] [review]:
-----------------------------------------------------------------
I believe the necko bits are the same as last time, right? Assuming that's true you can just carry fwd the old r+ for that part, no need to ask again.
Attachment #578463 -
Flags: review?(mcmanus) → review+
| Assignee | ||
Comment 28•13 years ago
|
||
You're right, Patrick, apologies. No Necko changes.
Michael, if you can still review the new files added, that would be great.
Comment 29•13 years ago
|
||
Comment on attachment 578463 [details] [diff] [review]
v1.3 diff; ensure undefined symbols exposed in Bionic
Looked over the diff to the imported files and it looks ok to me. A lot of files added but I guess we don't have a choice.
Attachment #578463 -
Flags: review?(mwu) → review+
| Assignee | ||
Comment 30•13 years ago
|
||
Thanks Michael. Agreed re a lot of files. However, as you say, there is no way around it. I hope #ifdef'ing out some of the code makes a little difference to the size of libmozutils.
| Assignee | ||
Comment 31•13 years ago
|
||
Sorry for the delay on "checkin-needed". I wanted to make sure I had a good try run first, and there were a few issues last week, namely everything looked ok except mochitest-other first of all, then Android builds were weird on try server. Mochitest-other was fixed by rebasing and pushing again:
https://tbpl.mozilla.org/?tree=Try&rev=d921dcb621a5
Android builds are broken all over, but I have a good try run using the same patch here:
https://tbpl.mozilla.org/?tree=Try&rev=9df027fce4c3
So, rather than waiting any more, I think it's time to try it on inbound. I also wanted to wait to be around to monitor inbound for a while.
| Assignee | ||
Updated•13 years ago
|
Keywords: checkin-needed
Comment 32•13 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
You need to log in
before you can comment on or make changes to this bug.
Description
•