Closed Bug 871475 Opened 7 years ago Closed 6 years ago

B2G Emulator: support RIL data connection emulation

Categories

(Firefox OS Graveyard :: Emulator, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: vicamo, Assigned: vicamo)

References

()

Details

Attachments

(5 files, 3 obsolete files)

Emulator has only dummy PDP management now. We can't verify setup parameters because it always returns success, and there is no network interface ever bring up or down, can't actually test routing rules and there is no networking at all.

To have some more test cases on RIL data connection, we'll need a in kernel driver to initialize RIL net devices and hardware emulation on emulator host side as well. We should also gain control to the established pdp context, namely disconnect it.

See also 3GPP TS 27.060 for general information, 3GPP TS 27.007 clause 10 for related AT commands, 3GPP TS 23.060 for detailed GPRS reference.
Blocks: b2g-emulator
Component: General → Emulator
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #0)
> To have some more test cases on RIL data connection, we'll need a in kernel
> driver to initialize RIL net devices and hardware emulation on emulator host
> side as well.

Looks like we can re-use existing SMC91C111 ethernet driver for this because the thing we really need is an valid network interface, the name 'rmnet<N>' (actually emulator returns 'eth0') is not so much important, so we don't really want a new kernel driver for this.  With SMC91C111 NICs, we will have eth1, eth2, ... instead.

We'll also need some way to bring emulated SMC91C111 link up and down to simulate data connection activate/deactivate.

> See also 3GPP TS 27.060 for general information, 3GPP TS 27.007 clause 10
> for related AT commands, 3GPP TS 23.060 for detailed GPRS reference.

"AT+CGCONTRDP" can serve as a official command for necessary entries in RIL_UNSOL_DATA_CALL_LIST_CHANGED/RIL_REQUEST_DATA_CALL_LIST except kernel ifname.
Assignee: nobody → vyang
Duplicate of this bug: 946311
Attached file Github pull request for hardware/ril (obsolete) —
Use AT+CGCONTRDP to retrieve dynamic properties such as allocated IP, gateway address, DNSs.
Attachment #8345237 - Flags: review?(htsai)
Attached file Github PR for external/qemu (obsolete) —
Part-1 is a minor fix to clarify the meaning of 'id'.

Part-2 is mostly about the <local_addr> field returned in AT+CGDCONT.  Its read form is actually obsoleted in the previous patch for hardware/ril (attachment  8345237 [details]).  But its test form might be used in bug 821578.

Part-3 is to add additional NICs for dial-up access.  Currently I create 4 in total even with multi-sim configuration (Emulator has always 9 modems).

Part-4, for SMC91C111 datasheet, please google "SMC91C111 datasheet".  The original SMC91C111 qemu implementation has to be patched due to the lack of ability to report link down/up, which is then used as a method to ensure no frame goes out when the connection is down.

Part-5, AOSP emulator has only one user SLIRP domain -- "10.0.2.0/24".  All the dial-up links must then have the same subnet/netmask/gateway/dns.

Part-6, implement the AT+CGCONTRDP command for passing dynamic properties to rild.  There is also a dirty hack to pass kernel iface number in <bearer_id> field.
Attachment #8345239 - Flags: review?(htsai)
Attachment #8345239 - Flags: feedback?(jjong)
The last step to do is to disable default route of eth0 at boot.  This is etched in "system/core/rootdir/etc/init.goldfish.sh".  Since we prefer not to touch system/core, yet another patch for ”init.goldfish.rc“ is required for ICS.

Or, a temporary hack, you may disable eth0 before turning on data connection:
$ adb remount
$ adb push gaia/build/busybox-armv6l /system/bin/busybox
$ adb shell chmod 755 /system/bin/busybox
$ adb shell busybox ifconfig eth0 down

I have to go to the church for a confession before uploading such a patch...
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #7)
> The last step to do is to disable default route of eth0 at boot.

This fails several Marionette test cases with 'Error loading page'.
Marionette      DEBUG   Got request: goUrl, data: {"to":"0","sessionId":"6-b2g","name":"goU    rl","parameters":{"url":"http://10.247.24.42:35252/test.html"}}, id: null
The nature limits of bringing up data connection emulation are:

1) emulator SLIRP supports only one subnet -- 10.0.2.0/24
 => same gateway/dns/... for all outgoing network interfaces

2) we need networking throughout the automation test process.
 => default route must exist, and with 1), must be 10.0.2.2

3) Gecko should not contain emulator specific code:
 => must be able to set default route to eth<N> without additional parameters.

Then the most likely solution is to set default route of eth0 with metric 1:

$ adb shell busybox route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.247.75.5     10.0.2.2        255.255.255.255 UGH   0      0        0 eth1
10.0.2.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.2.0        0.0.0.0         255.255.255.0   U     0      0        0 eth1
0.0.0.0         10.0.2.2        0.0.0.0         UG    0      0        0 eth1
0.0.0.0         10.0.2.2        0.0.0.0         UG    1      0        0 eth0

And it works!
Currently emulator eth0 bringing up is done in "/system/etc/init.goldfish.sh" by:

  route add default gw 10.0.2.2 dev eth0

The `route` command used here is actually a symlink to `toolbox`, and its source code is available in "${B2G_DIR}/system/core/toolbox/route.c".  However, that route command is a very limited version and does not support setting metric value for a route entry.  In order to do that, there are a few possible options:

1) fork "${B2G_DIR}/system/core" and add one line `rt.rt_metric = 2;` to route.c, function route_main.  This follows all default route addition using toolbox will always come with metric 1.

2) add a new utility in "${B2G_DIR}/device/generic/goldfish" that is dedicated for creating routes with additional metric parameters.

3) import busybox and have a patch in "${B2G_DIR}/device/generic/goldfish" that forces the use of busybox in "init.goldfish.sh".  However, this is now blocked by bug 935776 -- emulator is broken on Maverick.

These options are ordered by magnitude of works to do.  3) would be the best, generic and also beneficial for others.
Another method suggested by :Edgar is iproute2 (https://android.googlesource.com/platform/external/iproute2/), which has been imported to AOSP since Froyo.  Sounds like a more easier yet compatible way.
(In reply to Vicamo Yang [:vicamo][:vyang] (PTO Dec. 21 ~ Jan. 5) from comment #12)
> Another method suggested by :Edgar is iproute2
> (https://android.googlesource.com/platform/external/iproute2/), which has
> been imported to AOSP since Froyo.  Sounds like a more easier yet compatible
> way.

Confirmed `ip route add default via 10.0.2.2 dev eth0 metric 2` works.
Blocks: 951053
Comment on attachment 8345239 [details]
Github PR for external/qemu

I have tested the patches and they work well!

Just having some doubts about the _amodem_rmnets struct. Why are we associating mask, gw and dnses with rmnets? This way, we use the same mask, gw and dnses for all data connections.

Another thing is ADataConnection->ip and ADataContext->ip seems not to be synced, if RIL defines an IP using +CGDCONT, it is not reflected in ADataConnection->ip.

Please let me know if I am missing anything. Thanks.
Attachment #8345239 - Flags: feedback?(jjong) → feedback+
(In reply to Jessica Jong [:jjong] [:jessica] from comment #14)
> Comment on attachment 8345239 [details]
> Github PR for external/qemu
> 
> I have tested the patches and they work well!
> 
> Just having some doubts about the _amodem_rmnets struct. Why are we
> associating mask, gw and dnses with rmnets? This way, we use the same mask,
> gw and dnses for all data connections.

Because that's a natural limit of current qemu SLIRP implementation.  It allows only one outgoing subnet 10.0.2.0/24.  So all rmnets have to share the same mask/gw/dns.

> Another thing is ADataConnection->ip and ADataContext->ip seems not to be
> synced, if RIL defines an IP using +CGDCONT, it is not reflected in
> ADataConnection->ip.

See commit messages in https://github.com/vicamo/b2g_platform_external_qemu/commit/0b5f033 .

  The read form of the command will continue to return the null string even if
  an address has been allocated during the PDP startup procedure.

There is no further explanation for non-null PDP_addr.

Besides, two things involved in this case.  First, we never give PDP_addr in the set form of AT+CGDCONT in rild; second, with PR for hardware/ril, we'll no longer use AT+CGDCONT to retrieve dynamic properties like PDP_addr.

> Please let me know if I am missing anything. Thanks.

Thank you for the feedback.
Comment on attachment 8345237 [details]
Github pull request for hardware/ril

Sorry for being so late. Please see comments I dropped on github. Thanks!
Comment on attachment 8345239 [details]
Github PR for external/qemu

Sorry for being so late. Please see comments I dropped on github. Thanks.
Have revised the commits to address your comments.  Please help review again.
Comment on attachment 8345237 [details]
Github pull request for hardware/ril

\o/ Thank you.
Attachment #8345237 - Flags: review?(htsai) → review+
Comment on attachment 8345239 [details]
Github PR for external/qemu

Thanks for the work and comment addressing. :) Awesome!!!
Attachment #8345239 - Flags: review?(htsai) → review+
No longer blocks: 951053
Depends on: 951053
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #15)
> (In reply to Jessica Jong [:jjong] [:jessica] from comment #14)
> > Comment on attachment 8345239 [details]
> > Github PR for external/qemu
> > 
> > I have tested the patches and they work well!
> > 
> > Just having some doubts about the _amodem_rmnets struct. Why are we
> > associating mask, gw and dnses with rmnets? This way, we use the same mask,
> > gw and dnses for all data connections.
> 
> Because that's a natural limit of current qemu SLIRP implementation.  It
> allows only one outgoing subnet 10.0.2.0/24.  So all rmnets have to share
> the same mask/gw/dns.
> 
> > Another thing is ADataConnection->ip and ADataContext->ip seems not to be
> > synced, if RIL defines an IP using +CGDCONT, it is not reflected in
> > ADataConnection->ip.
> 
> See commit messages in
> https://github.com/vicamo/b2g_platform_external_qemu/commit/0b5f033 .
> 
>   The read form of the command will continue to return the null string even
> if
>   an address has been allocated during the PDP startup procedure.
> 
> There is no further explanation for non-null PDP_addr.
> 
> Besides, two things involved in this case.  First, we never give PDP_addr in
> the set form of AT+CGDCONT in rild; second, with PR for hardware/ril, we'll
> no longer use AT+CGDCONT to retrieve dynamic properties like PDP_addr.
> 
> > Please let me know if I am missing anything. Thanks.
> 
> Thank you for the feedback.

I see. Thanks for the clarification. :)
Patches "init.goldfish.sh" to set metric value on eth0.
Attachment #8359725 - Flags: review?(mwu)
Attachment #8359725 - Attachment description: pr → Github pull request for device_generic_goldfish
Verified in jellybean, but DNS resolution failed in KitKat.  "Fail to connect to Netd after retry 10 times" could be the reason.
Just a ping.  We really need some enhancements to emulator networking function to land more test cases to ensure the coming refactoring process doesn't create unexpected regressions.
Flags: needinfo?(mwu)
Comment on attachment 8359725 [details] [review]
Github pull request for device_generic_goldfish

Had been merged by Andreas Gal.  Will go on to merge other PRs and merge to emulator-jb and emulator-kk.
Attachment #8359725 - Flags: review?(mwu)
Flags: needinfo?(mwu)
Found mozilla-b2g GitHub owner policy has changed. Permission to merge was revoked.  Already set NI in bug 970212 for future GitHub landing process.
Keywords: checkin-needed
platform_hardware_ril/master:  94a3764bc3dfc54bde284ba18f6b66514f656aff
platform_external_qemu/master: b345aaae0393057d6d3e11f3ce75a18300bf3fe4

And as noted, gal took care of the device_generic_goldfish/master merge: e7e8734fdd8bf41e48a56c1c85e0f7dac60aaa9f
Status: NEW → RESOLVED
Closed: 6 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → 1.4 S1 (14feb)
This caused a crashtest perma-fail in the B2G emulator. Reverted.
platform_hardware_ril/master:  d11f524d00cacf5ba0dfbf25e4aa2158b1c3a036
platform_external_qemu/master: 022eadd5917615ff00c47eaaafa792b45e9c8a28

https://tbpl.mozilla.org/php/getParsedLog.php?id=34924237&tree=B2g-Inbound

10:46:28     INFO -  REFTEST TEST-START | http://10.0.2.2:8888/tests/layout/tables/crashtests/691824-1.xhtml
10:46:28     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8888/tests/layout/tables/crashtests/691824-1.xhtml | 503 / 864 (58%)
10:46:28     INFO -  REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/tables/crashtests/691824-1.xhtml | load failed: null
10:46:28     INFO -  REFTEST INFO | Saved log: START http://10.0.2.2:8888/tests/layout/tables/crashtests/691824-1.xhtml
10:46:28     INFO -  REFTEST INFO | Loading a blank page
10:46:28     INFO -  REFTEST TEST-END | http://10.0.2.2:8888/tests/layout/tables/crashtests/691824-1.xhtml
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 1.4 S1 (14feb) → ---
And reftest-10 failures.
https://tbpl.mozilla.org/php/getParsedLog.php?id=34923468&tree=B2g-Inbound

10:36:45     INFO -  REFTEST TEST-START | http://10.0.2.2:8888/tests/layout/reftests/text-svgglyphs/svg-glyph-extents.html
10:36:45     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8888/tests/layout/reftests/text-svgglyphs/svg-glyph-extents.html | 905 / 1473 (61%)
10:36:45     INFO -  REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/text-svgglyphs/svg-glyph-extents.html | load failed: timed out waiting for reftest-wait to be removed
10:36:45     INFO -  REFTEST INFO | Saved log: START http://10.0.2.2:8888/tests/layout/reftests/text-svgglyphs/svg-glyph-extents.html
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] OnDocumentLoad triggering WaitForTestEnd
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] WaitForTestEnd: Adding listeners
10:36:45     INFO -  REFTEST INFO | Saved log: Initializing canvas snapshot
10:36:45     INFO -  REFTEST INFO | Saved log: DoDrawWindow 0,0,800,1000
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.0.2.2:8888/tests/layout/reftests/text-svgglyphs/svg-glyph-extents.html
10:36:45     INFO -  REFTEST INFO | Saved log: Updating entire canvas for invalidation
10:36:45     INFO -  REFTEST INFO | Saved log: DoDrawWindow 0,0,800,1000
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] MakeProgress: dispatching MozReftestInvalidate
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_FOR_REFTEST_WAIT_REMOVAL
10:36:45     INFO -  REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for reftest-wait to be removed
10:36:45     INFO -  REFTEST INFO | Loading a blank page
10:36:45     INFO -  REFTEST TEST-END | http://10.0.2.2:8888/tests/layout/reftests/text-svgglyphs/svg-glyph-extents.html
Confirmed that these tests went green post-backout.
100% reproducible in my emulator build.  Fixing mismatched tags in layout/tables/crashtests/691824-1.xhtml seems to solve that failure.  Don't know why.  See also bug 691824.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #31)
> 100% reproducible in my emulator build.  Fixing mismatched tags in
> layout/tables/crashtests/691824-1.xhtml seems to solve that failure.

Fixes only "691824-1.xhtml" itself, but I set three other failures: "text-overflow-bug666751-1.html" (C2), "text-overflow-bug666751-2.html" (C2), and "601427.html" (C1).  These three fail with the same error as "691824-1.xhtml" and are 100% reproducible locally without the two PRs.

Command I used: `cd ${B2G_HOME}; ./mach crashtest-remote`.  NI? Ehsan if he has any idea about this.
Flags: needinfo?(ehsan)
Attached file crashtest-remote-all_no-rmnets.log.gz (obsolete) —
adb logcat for crashtest-remote with "601427.html" fixed and without rmnet enumeration.
Blocks: 957917
Whenever I instantiate a rmnet device using original SMC91C111 driver, the two reftest and crashtest cases fails.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #34)
> Whenever I instantiate a rmnet device using original SMC91C111 driver, the
> two reftest and crashtest cases fails.

Confirmed by appending qemu tail arguments "-net nic -net nic -net user" to testing/marionette/client/marionette/emulator.py.
All versions (ics, jb, kk) of ARM emulators are vulnerable. :(
I think that's a problem in either qemu or emulator kernel images.  Cancel NI? for Ehsan.
Flags: needinfo?(ehsan)
(In reply to comment #37)
> I think that's a problem in either qemu or emulator kernel images.  Cancel NI?
> for Ehsan.

OK!  :-)  Let me know if there is anything else I can help with, though!
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #35)
> Confirmed by appending qemu tail arguments "-net nic -net nic -net user" to
> testing/marionette/client/marionette/emulator.py.

"-net nic -net nic,vlan=1 -net user" works!
URL:
I think there is a big problem in QEmu VLAN design.  VLAN is used to group a certain NICs but each of them should still maintain its own state independently.  However, as we can see in `qemu_can_send_packet` and `qemu_deliver_packet`, the second NIC may have a chance to affect the first one in the same VLAN by:

  1) processing order in a VLAN is reversed -- eth1, the second NIC, is process prior to eth0,
  2) eth1 always returns |size| in `qemu_deliver_packet` because its link state is down,
  3) when it iterates to eth0, no matter what it returns from its receive function, the return value of `qemu_deliver_packet` is always `size` because `ret` is now positive,
  4) that packet is then dropped because it has been successfully processed, so eth0 will never have a chance to receive it again.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #40)
>   1) processing order in a VLAN is reversed -- eth1, the second NIC, is
> process prior to eth0,

Hmm, Bluetooth scatternet is reversed, but nic VLAN is not.  Anyway, we have to bind "-net user" to the same VLAN with the outgoing interfaces, and only one SLIRP user net is allowed currently. :(
Blocks: 978071
Cherry-pick http://git.qemu.org/?p=qemu.git;a=commit;h=60c07d933c66c4b30a83b7ccbc8a0cb3df1b2d0e seems to fix the problem.  "layout/tables/crashtests/crashtests.list" no longer fails.  Trying all others.
Both following two passed.  Got to find some way to verify on try server.

$ ./mach reftest-remote gecko/layout/reftests/text-svgglyphs/reftest.list
$ ./mach crashtest-remote gecko/layout/tables/crashtests/crashtests.list
Duplicate of this bug: 797175
Attachment #8345237 - Attachment is obsolete: true
Attachment #8345239 - Attachment is obsolete: true
Attachment #8379453 - Attachment is obsolete: true
Attached file radio.log.gz
When running "dom/mobileconnection/tests/marionette/manifest.ini" and "dom/system/gonk/tests/marionette/test_data_connection.js", somehow rild/qemu begin to return rmnet1 at some point. While we don't have multiple PDP emulation at this time (bug 821578), `configureInterface` begins to return -EEXIST.
Today's try runs: https://tbpl.mozilla.org/?tree=Try&rev=dcc83834deca
* only changes for this bug are included
* use a more generic format "android.ifrename=name1:name2" for kernel cmdline instead.
* no more problems found in comment 29.
Manifest bumper has included these changes in m-c.  Resolve as fixed now.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Blocks: 991025
You need to log in before you can comment on or make changes to this bug.