Last Comment Bug 694325 - Android DNS is single threaded
: Android DNS is single threaded
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking (show other bugs)
: 10 Branch
: ARM Android
: -- major (vote)
: mozilla11
Assigned To: Steve Workman [:sworkman] (INACTIVE)
:
Mentors:
Depends on: 687367 693103 693280
Blocks: 689989 704639 958010
  Show dependency treegraph
 
Reported: 2011-10-13 08:08 PDT by Patrick McManus [:mcmanus]
Modified: 2014-01-10 09:48 PST (History)
26 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1.0 diff (236.10 KB, patch)
2011-11-16 18:42 PST, Steve Workman [:sworkman] (INACTIVE)
mcmanus: review+
mwu.code: review+
Details | Diff | Splinter Review
v1.0 diff; changes to files pulled in (140.10 KB, patch)
2011-11-16 18:49 PST, Steve Workman [:sworkman] (INACTIVE)
no flags Details | Diff | Splinter Review
v1.1 diff; removes most __wrap prefixes (234.81 KB, patch)
2011-11-17 20:59 PST, Steve Workman [:sworkman] (INACTIVE)
sjhworkman: review+
Details | Diff | Splinter Review
v1.1 diff; changes to files pulled in; most __wrap prefixes removed (31.78 KB, patch)
2011-11-17 21:00 PST, Steve Workman [:sworkman] (INACTIVE)
no flags Details | Diff | Splinter Review
v1.2 diff; removed try instructions from header (234.84 KB, patch)
2011-11-21 15:50 PST, Steve Workman [:sworkman] (INACTIVE)
sjhworkman: review+
Details | Diff | Splinter Review
v1.3 diff; ensure undefined symbols exposed in Bionic (367.21 KB, patch)
2011-12-01 18:18 PST, Steve Workman [:sworkman] (INACTIVE)
mcmanus: review+
mwu.code: review+
Details | Diff | Splinter Review
v1.3 diff; changes to files pulled in; ensure undefined symbols exposed in Bionic (66.96 KB, patch)
2011-12-01 18:20 PST, Steve Workman [:sworkman] (INACTIVE)
no flags Details | Diff | Splinter Review

Description Patrick McManus [:mcmanus] 2011-10-13 08:08:44 PDT
+++ This bug was initially created as a clone of Bug #687367 +++

As a consequence of the workaround in bug 687367 there is no concurrency in DNS resolution on android. Prefetch is disabled on that platform too, in order to take some pressure off the single threaded resolver.

This is a particularly bad problem in a high latency mobile environment.

m.k.edwards has a patch to restore concurrency here that needed to be backed out due to crashes. https://hg.mozilla.org/mozilla-central/rev/c3a50afc2243
Comment 1 Steve Workman [:sworkman] (INACTIVE) 2011-11-16 18:42:17 PST
Created attachment 575070 [details] [diff] [review]
v1.0 diff

Bypass pre-Honeycomb (pre v3.0) implementation of getaddrinfo to provide thread-safety for relevant stdio calls:
- implement __wrap_getaddrinfo
- set symbols in configure.in so calls to getaddrinfo are translated to __wrap_getaddrinfo
- change _files_getaddrinfo to use mmap/open/gets/close rather than fopen/fgets/fclose
- pull in supporting files from bionic (gingerbread, v2.3) to avoid crashes (see bug 687367, bug 693103) on (HTC) devices (14 files in total; getaddrinfo.c + 5 source files, 8 headers) (Note: diff to changes to these files in next attachment).
Comment 2 Steve Workman [:sworkman] (INACTIVE) 2011-11-16 18:49:26 PST
Created attachment 575071 [details] [diff] [review]
v1.0 diff; changes to files pulled in

Diff detailing changes to files pulled in from Android v2.3 Bionic.
Clarification: main diff contains files with changes already applied.  This diff is for easy review only.

Note: Most of these files are included to deal with the crash see in bug 693103.  Differences between the libc version of "struct res_state" and the version in these files seemed to be causing the crash.  It took all these files to isolate the use of this structure so that a single version of its layout is used in the callgraph.
Comment 3 Steve Workman [:sworkman] (INACTIVE) 2011-11-16 18:53:13 PST
FYI, I've tested the patch on several Android devices as follows:
Samsung Infuse running 2.2.1
Samsung Galaxy S running 2.3.4
HTC Evo 4g running 2.3.3.

No crashes seen on these devices.
Comment 4 Jared Wein [:jaws] (please needinfo? me) 2011-11-16 19:00:05 PST
(In reply to Steve Workman [:sworkman] from comment #3)
> FYI, I've tested the patch on several Android devices as follows:
> HTC Evo 4g running 2.3.3.

Steve: I'm not sure if my phone should have been included as well. My phone is an HTC Sensation 4G running 2.3.4.
Comment 5 Patrick McManus [:mcmanus] 2011-11-17 07:30:47 PST
Comment on attachment 575071 [details] [diff] [review]
v1.0 diff; changes to files pulled in

Review of attachment 575071 [details] [diff] [review]:
-----------------------------------------------------------------

There's nothing for me to make a r? judgment on here so I'm clearing the flag; this was just background information - and I really appreciated it!
Comment 6 Patrick McManus [:mcmanus] 2011-11-17 07:32:22 PST
Comment on attachment 575070 [details] [diff] [review]
v1.0 diff

Review of attachment 575070 [details] [diff] [review]:
-----------------------------------------------------------------

Steve, I think this is great. I can only provide the r+ for the bits in netwerk that turn parallelism back on, but I've read the whole thing and I really think this is the right approach. I appreciate that you revert to system behavior when it is safe again with honeycomb. This will be a competitive advantage on android for us.

r=mcmanus for netwerk/*
Comment 7 Steve Workman [:sworkman] (INACTIVE) 2011-11-17 10:45:37 PST
(In reply to Jared Wein [:jwein and :jaws] from comment #4)
> (In reply to Steve Workman [:sworkman] from comment #3)
> > FYI, I've tested the patch on several Android devices as follows:
> > HTC Evo 4g running 2.3.3.
> 
> Steve: I'm not sure if my phone should have been included as well. My phone
> is an HTC Sensation 4G running 2.3.4.

Jared, thanks for adding that.  I didn't include your phone just because the final patch uploaded here was a little different from the one I used on your phone earlier in the day.
Comment 8 Steve Workman [:sworkman] (INACTIVE) 2011-11-17 10:49:31 PST
Thanks for the r+ Patrick.  I should clarify that I started with the last big patch in bug 687367, so the pre/post-honeycomb decision in the code was not something I added, but came from MK Edwards.  In any case, yes, it's good behavior.
Comment 9 Michael Wu [:mwu] 2011-11-17 12:51:14 PST
Comment on attachment 575071 [details] [diff] [review]
v1.0 diff; changes to files pulled in

>@@ -1361,18 +1504,16 @@ static int
> _get_label(const struct sockaddr *addr)
> {
>     if (addr->sa_family == AF_INET) {
>         return 3;
>     } else if (addr->sa_family == AF_INET6) {
>         const struct sockaddr_in6 *addr6 = (const struct sockaddr_in6 *)addr;
>         if (IN6_IS_ADDR_LOOPBACK(&addr6->sin6_addr)) {
>             return 0;
>-        } else if (IN6_IS_ADDR_ULA(&addr6->sin6_addr)) {
>-            return 1;

Why was this removed?

Overall this looks mostly right, though it seems strange that so many symbols had __wrap prefixed. Most of these symbols shouldn't be exported in the first place, so we shouldn't have to prefix them to avoid collision. Prefixing __wrap is also a bit weird when it's not actually a symbol that libxul is accessing through the linker wrapping/renaming.
Comment 10 Michael Wu [:mwu] 2011-11-17 12:55:56 PST
Comment on attachment 575070 [details] [diff] [review]
v1.0 diff

Review of attachment 575070 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good overall. Just adjust configure.in and maybe remove/rename some of the __wrap prefixing if you can.

::: configure.in
@@ +7262,5 @@
>      WRAP_LDFLAGS="${WRAP_LDFLAGS} -L$_objdir/dist/lib -lmozutils"
>      if test "$MOZ_WIDGET_TOOLKIT" = android; then
>          WRAP_LDFLAGS="${WRAP_LDFLAGS} -Wl,--wrap=dlopen,--wrap=dlclose,--wrap=dlerror,--wrap=dlsym,--wrap=dladdr"
>      fi
> +    WRAP_LDFLAGS="${WRAP_LDFLAGS} -Wl,--wrap=getaddrinfo,--wrap=freeaddrinfo,--wrap=gai_strerror"

Can you move this inside the android toolkit check? (lines 7263-7265)

Gonk (OS_TARGET=Android, MOZ_WIDGET_TOOLKIT=gonk) doesn't build mozutils from other-licenses/android.
Comment 11 Steve Workman [:sworkman] (INACTIVE) 2011-11-17 13:32:08 PST
Thanks for the comments, Michael.

Re comment 9, please see MK Edwards original comment on this in bug 687367 comment 28:

"That macro isn't defined by the older headers in the NDK we're using.  It's a side effect of another patch that was already applied to the version of getaddrinfo.c I started from; we want the rest of that patch."

Re configure.in, no problem.  I wasn't sure about that one myself, so I can fix that one easily.

Re __wrap prefixing, I guess my approach was to be on the cautious side and just prefix any non-static functions in the files brought in.  With respect to those functions not accessed through linker-wrapping/renaming, again, I was staying on the cautious side of things - these functions may not be called directly from libxul, but they are part of the call graph from __wrap_getaddrinfo etc.  Let me take another look over this and see what I can come up with.
Comment 12 Steve Workman [:sworkman] (INACTIVE) 2011-11-17 20:59:07 PST
Created attachment 575386 [details] [diff] [review]
v1.1 diff; removes most __wrap prefixes

Carrying forward r+
Comment 13 Steve Workman [:sworkman] (INACTIVE) 2011-11-17 21:00:35 PST
Created attachment 575387 [details] [diff] [review]
v1.1 diff; changes to files pulled in; most __wrap prefixes removed
Comment 14 Michael Wu [:mwu] 2011-11-17 21:07:12 PST
Comment on attachment 575387 [details] [diff] [review]
v1.1 diff; changes to files pulled in; most __wrap prefixes removed

Looks good. r=me on the full patch.
Comment 15 Steve Workman [:sworkman] (INACTIVE) 2011-11-21 15:48:08 PST
Try results:
    https://tbpl.mozilla.org/?tree=Try&rev=9be31b71f8f4

All seems to be fine.  (Note: I had issues on Friday getting try to run for the Android build; hence the delay in getting these results).

Re Performance, I checked tp4m and tp4m_nochrome: results seem to have landed within recently observed ranges.
Comment 16 Steve Workman [:sworkman] (INACTIVE) 2011-11-21 15:50:05 PST
Created attachment 576008 [details] [diff] [review]
v1.2 diff; removed try instructions from header

Removed try instructions so patch is ready for landing on inbound.
Comment 18 Steve Workman [:sworkman] (INACTIVE) 2011-11-21 18:31:56 PST
Thanks, Doug.
Comment 19 Ed Morley [:emorley] 2011-11-22 09:09:05 PST
https://hg.mozilla.org/mozilla-central/rev/bfb56029f4bd
Comment 20 Doug Turner (:dougt) 2011-11-22 14:42:33 PST
We believe this patch is causing crashes on tablets:


E/AndroidRuntime(27160): java.lang.UnsatisfiedLinkError: Cannot load library: reloc_library[1315]:    87 cannot locate '_res_opcodes'...
Comment 21 Steve Workman [:sworkman] (INACTIVE) 2011-11-22 14:58:51 PST
Discussed with DougT on IRC.  Crash is happening on birch.  So, mozilla-central is unaffected at the moment, but this will cause problems with the merge next week.  So, let's back it out and get the def for the symbol added to the patch.
Comment 22 Doug Turner (:dougt) 2011-11-22 15:02:01 PST
Backed out : https://hg.mozilla.org/mozilla-central/rev/a9a0abc25480
Comment 23 Mike Hommey [:glandium] 2011-11-22 22:27:06 PST
+extern const char * const _res_opcodes[];
+extern const char * const _res_sectioncodes[];

These variables are defined extern and never defined in our code, which means we rely on the system library containing them. This is dangerous, as subtle binary compatibility changes in system library can break our code (and surely, this is what happens on tablets)
Comment 24 Mike Hommey [:glandium] 2011-11-22 22:44:07 PST
Even simpler, the _res_opcodes symbol is not exported from libc.so in honeycomb.

# objdump -T libc.so.honeycomb | grep _res_opcode
# objdump -T libc.so.gingerbread | grep _res_opcode
00040838 g    DO .data.rel.ro   00000040 _res_opcodes
Comment 25 Steve Workman [:sworkman] (INACTIVE) 2011-12-01 18:18:05 PST
Created attachment 578463 [details] [diff] [review]
v1.3 diff; ensure undefined symbols exposed in Bionic

Pulled in more files to ensure symbols are either defined in libmozutils via these files, or explicitly exposed in the NDK headers.

Tested XUL and Birch builds on:
  Samsung Gal Tab 10.1 (Android v3.2)
  Asus Tab (v3.2.1)
  HTC Evo 3D (v2.3.4 with Sense v3.0)
  Samsung Gal S (v2.3.4)
  AT&T (Huawei) Impulse (v2.2.2)
... to cover the original case, crash observed on HTC devices, and crash observed on Honeycomb tablets.
Comment 26 Steve Workman [:sworkman] (INACTIVE) 2011-12-01 18:20:49 PST
Created attachment 578464 [details] [diff] [review]
v1.3 diff; changes to files pulled in; ensure undefined symbols exposed in Bionic

Changes made to the files pulled in.  Tried to #ifdef out functions that are not called in libmozutils, using "#define MOZILLA_NECKO_EXCLUDE_CODE".
Comment 27 Patrick McManus [:mcmanus] 2011-12-02 06:56:01 PST
Comment on attachment 578463 [details] [diff] [review]
v1.3 diff; ensure undefined symbols exposed in Bionic

Review of attachment 578463 [details] [diff] [review]:
-----------------------------------------------------------------

I believe the necko bits are the same as last time, right? Assuming that's true you can just carry fwd the old r+ for that part, no need to ask again.
Comment 28 Steve Workman [:sworkman] (INACTIVE) 2011-12-02 10:49:40 PST
You're right, Patrick, apologies.  No Necko changes.

Michael, if you can still review the new files added, that would be great.
Comment 29 Michael Wu [:mwu] 2011-12-02 11:29:17 PST
Comment on attachment 578463 [details] [diff] [review]
v1.3 diff; ensure undefined symbols exposed in Bionic

Looked over the diff to the imported files and it looks ok to me. A lot of files added but I guess we don't have a choice.
Comment 30 Steve Workman [:sworkman] (INACTIVE) 2011-12-02 11:32:26 PST
Thanks Michael.  Agreed re a lot of files.  However, as you say, there is no way around it.  I hope #ifdef'ing out some of the code makes a little difference to the size of libmozutils.
Comment 31 Steve Workman [:sworkman] (INACTIVE) 2011-12-05 10:33:21 PST
Sorry for the delay on "checkin-needed".  I wanted to make sure I had a good try run first, and there were a few issues last week, namely everything looked ok except mochitest-other first of all, then Android builds were weird on try server.  Mochitest-other was fixed by rebasing and pushing again:
    https://tbpl.mozilla.org/?tree=Try&rev=d921dcb621a5

Android builds are broken all over, but I have a good try run using the same patch here:
    https://tbpl.mozilla.org/?tree=Try&rev=9df027fce4c3

So, rather than waiting any more, I think it's time to try it on inbound.  I also wanted to wait to be around to monitor inbound for a while.
Comment 32 Ed Morley [:emorley] 2011-12-06 10:06:43 PST
https://hg.mozilla.org/mozilla-central/rev/7047e1d7c6de

Note You need to log in before you can comment on or make changes to this bug.