Closed Bug 1014065 Opened 11 years ago Closed 10 years ago

[NFC] nfcd gets stuck during startup on emulator

Categories

(Firefox OS Graveyard :: NFC, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:backlog)

RESOLVED FIXED
2.0 S5 (4july)
tracking-b2g backlog

People

(Reporter: tzimmermann, Assigned: tzimmermann)

References

Details

Attachments

(4 files, 4 obsolete files)

Attached file emulator-5554.log
During startup on the emulator, nfcd gets stuck in |NfcManager::initialize|. The lines in the stack trace below might be incorrect, but the call is sNfaEnableEvent.wait() near line 140.

#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/futex_arm.S:41
#1  0x4012368c in __pthread_cond_timedwait_relative (cond=0x2a0161f8, mutex=0x2a0161fc, reltime=0x0) at bionic/libc/bionic/pthread.c:1127
#2  0x401236ec in __pthread_cond_timedwait (cond=0x2a0161f8, mutex=0x2a0161fc, abstime=<optimized out>, clock=<optimized out>) at bionic/libc/bionic/pthread.c:1150
#3  0x2a00a25a in CondVar::wait (this=0x2a0161f8, mutex=<optimized out>) at system/nfcd/src/nci/CondVar.cpp:33
#4  0x2a007904 in NfcManager::initialize (this=0x2a01aa70) at system/nfcd/src/nci/NfcManager.cpp:144
#5  0x2a002ffe in NfcService::enableNfc (this=0x2a01ab00) at system/nfcd/src/NfcService.cpp:639
#6  0x2a0031b4 in NfcService::handleEnableResponse (this=0x2a01ab00, event=<optimized out>) at system/nfcd/src/NfcService.cpp:608
#7  0x2a0036e8 in NfcService::eventLoop (this=0x2a01ab00) at system/nfcd/src/NfcService.cpp:345
#8  0x2a00378e in serviceThreadFunc (arg=0x2a01ab00) at system/nfcd/src/NfcService.cpp:85
#9  0x40121a5c in __thread_entry (func=0x2a003779 <serviceThreadFunc(void*)>, arg=0x2a01ab00, tls=0x404a9f00) at bionic/libc/bionic/pthread_create.cpp:92
#10 0x40121bd8 in pthread_create (thread_out=0x2a016014, attr=<optimized out>, start_routine=0x78, arg=0x2a01ab00) at bionic/libc/bionic/pthread_create.cpp:201
Hi Dimi,

When waiting for the sNfaEnableEvent, I think I should see a call to nfaDeviceManagementCallback, but that never happens. The driver is blocked somewhere internally. Do you have an idea?

I don't have time to debug this further today. I'll continue tomorrow.
Flags: needinfo?(dlee)
See Also: → 1000063
Hi Thomas,
  Looks like one of the internal thread of libnfc-nci get stuck, but i cannot reproduce in my platform.
Could you help change the following in NfcManager.cpp to get more debug messages ?  Thanks

num = initializeGlobalAppLogLevel();  -->  num = 5
Flags: needinfo?(dlee)
Hi,

I figured out that the driver is blocked because it expects a CORE_RESET_NTF from the device during initialization. I have no idea why this hasn't shown up before, or why I'm the only one who sees this problem.

When I send the notification, I currently get a segmentation fault during the init process. Some callback pointer gets overwritten. See below. I'm busy with figuring out what happens.


Breakpoint 1, HAL_NfcOpen (p_hal_cback=0x406b0469 <BroadcomHalCallback(UINT8, tHAL_NFC_STATUS)>, p_data_cback=0x406b0405 <BroadcomHalDataCallback(UINT16, UINT8*)>)


Here nfc_hal_cb.p_hal_cback is set to the value of p_hal_cback.


    at external/libnfc-nci/halimpl/bcm2079x/hal/hal/nfc_hal_api.c:109
109	    if (p_hal_cback)
(gdb) s
111	        nfc_hal_dm_init ();
(gdb) s
nfc_hal_dm_init () at external/libnfc-nci/halimpl/bcm2079x/hal/hal/nfc_hal_dm.c:992
992	{
(gdb) s
993	    HAL_TRACE_DEBUG0 ("nfc_hal_dm_init ()");
(gdb) c
Continuing.
[New Thread 47.373]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47.373]
0x406c0594 in nfc_hal_cb () from /home/mozilla/Projects/mozilla/src/B2G-master-emulator-jb/out/target/product/generic/symbols/system/lib/hw/nfc_nci.generic.so
(gdb) up
#1  0x406b2d06 in nfc_hal_main_pre_init_done (status=<optimized out>) at external/libnfc-nci/halimpl/bcm2079x/hal/hal/nfc_hal_main.c:251
251	    nfc_hal_cb.p_stack_cback (HAL_NFC_OPEN_CPLT_EVT, status);
(gdb) print nfc_hal_cb 
$1 = {p_stack_cback = 0x406b0401 <SyncEvent::~SyncEvent()+16>, p_data_cback = 0x406b0405 <BroadcomHalDataCallback(UINT16, UINT8*)>, quick_timer_queue = {


And at this point the stored address is different, which leads to a call into a random function.


    p_first = 0x406c05dc, p_last = 0x406c05dc, last_ticks = 10}, timer = {p_next = 0x0, p_prev = 0x0, p_cback = 0x406b2e25 <nfc_hal_main_timeout_cback>, ticks = -2147483648, 
    param = 0, event = 2, in_use = 0 '\000'}, ncit_cb = {nci_ctrl_size = 255 '\377', rcv_state = 0 '\000', rcv_len = 0, p_rcv_msg = 0x2a020e9c, p_frag_msg = 0x0, 
    p_pend_cmd = 0x0, nci_ras = 0 '\000', nci_wait_rsp_timer = {p_next = 0x0, p_prev = 0x0, p_cback = 0x406b6311 <nfc_hal_nci_cmd_timeout_cback>, ticks = -2147483648, 
      param = 0, event = 0, in_use = 0 '\000'}, nci_wait_rsp = 0 '\000', last_hdr =  <incomplete sequence \374>, last_cmd = "\000", p_vsc_cback = 0x0}, dev_cb = {
    initializing_state = 0 '\000', brcm_hw_id = 544807683, next_dm_config = 0 '\000', next_startup_vsc = 0 '\000', power_mode = 0 '\000', snooze_mode = 8 '\b', 
    new_snooze_mode = 8 '\b', nfc_wake_active_mode = 0 '\000', lp_timer = {p_next = 0x0, p_prev = 0x0, p_cback = 0x406b52fd <nci_brcm_lp_timeout_cback>, ticks = 10, 
      param = 0, event = 0, in_use = 1 '\001'}, p_prop_cback = 0}, nvm_cb = {project_id = 256, ver_major = 147, ver_minor = 157, fpm_size = 6994, lpm_size = 0, 
    flags = 8 '\b', nvm_type = 1 '\001'}, prm = {state = 0 '\000', flags = 0, cur_patch_len_remaining = 0, p_cur_patch_data = 0x0, cur_patch_offset = 0, dest_ram = 0, 
    timer = {p_next = 0x0, p_prev = 0x0, p_cback = 0, ticks = 0, param = 0, event = 0, in_use = 0 '\000'}, spd_patch_needed_mask = 0, spd_patch_count = 0 '\000', 
    spd_cur_patch_idx = 0 '\000', spd_patch_desc = {{power_mode = 0 '\000', len = 0}, {power_mode = 0 '\000', len = 0}}, p_spd_patch = 0x0, spd_patch_len_remaining = 0, 
    spd_patch_offset = 0, format = 0 '\000', p_cback = 0, patchram_delay = 0}, prm_i2c = {p_patch = 0x0, prei2c_delay = 0, len = 0}, hci_cb = {hci_timer = {p_next = 0x0, 
      p_prev = 0x0, p_cback = 0x406b3b8d <nfc_hal_hci_timeout_cback>, ticks = 0, param = 0, event = 0, in_use = 0 '\000'}, p_hci_netwk_info_buf = 0x0, 
    p_hci_netwk_dh_info_buf = 0x0, hci_netwk_config_block = 0 '\000', b_wait_hcp_conn_create_rsp = 0 '\000', clear_all_pipes_to_uicc1 = 0 '\000', 
    hci_fw_workaround = 1 '\001', hci_fw_validate_netwk_cmd = 1 '\001', hcp_conn_id = 0 '\000'}, pre_discover_done = 0 '\000', max_rf_credits = 0 '\000', 
  trace_level = 0 '\000'}
(gdb)
Assignee: nobody → tzimmermann
Status: NEW → ASSIGNED
Attachment #8428714 - Flags: review?(dlee)
Attachment #8428716 - Flags: review?(vyang)
I'm still not sure what the problem was. The attached patches cleanup kernel and emulator to be more compatible with the user-space driver.
Attachment #8428714 - Flags: review?(dlee) → review+
Comment on attachment 8428716 [details]
Github pull request for kernel

Looks good to me, but do you have corresponding pull request for qemu-kernel at hand?  It won't take effect automatically without PRs to kernel binaries.
Attachment #8428716 - Flags: review?(vyang) → review+
Blocks: b2g-NFC-2.0
blocking-b2g: --- → backlog
Attached file Github pull request for kernel binary (obsolete) —
Hi Vicamo,

Sorry for this taking a while. I've ben on PTO for nearly two weeks and just returned today.

The pull request contains an updated kernel binary. Looking at the history of the repo makes me wonder if we need a better update process for the contained kernel binaries. Right now, everyone just seems to update pre-built kernels according to their needs.
Attachment #8437458 - Flags: review?(vyang)
(In reply to Thomas Zimmermann (on PTO) [:tzimmermann] [:tdz] from comment #8)
> Created attachment 8437458 [details]
> Github pull request for kernel binary
> 
> Hi Vicamo,
> 
> Sorry for this taking a while. I've ben on PTO for nearly two weeks and just
> returned today.

Me 2. Returned from a two weeks vocation on Jun 1. Never mind.

> The pull request contains an updated kernel binary. Looking at the history
> of the repo makes me wonder if we need a better update process for the
> contained kernel binaries. Right now, everyone just seems to update
> pre-built kernels according to their needs.

We can build kernel from source for qemu, but as a trade-off, everyone has to begin with a 1G git repository. prebuilts/qemu-kernel is now a 320MB folder though.  Another commonly used trick: specifying "depth=1" in manifest.xml, which, as it suggests, only clones the latest objects in the tree and results in a smaller git footprint.
Comment on attachment 8437458 [details]
Github pull request for kernel binary

Please send pull requests to either master or b2g-4.4.2_r1 branch instead.
Attachment #8437458 - Flags: review?(vyang) → review-
Changes since v1:

  - opening pull request into master branch
Attachment #8437458 - Attachment is obsolete: true
Attachment #8437597 - Flags: review?(vyang)
Attached file failed-adb-logcat-kernel-tdz.log (obsolete) —
FAILED TESTS
-------
test_ndef.js
test_nfc_peer_sendndf.js
test_nfc_tag.js

Gecko: git-1ce168a (HEAD of the day)
Gaia: git-bfc48b6
qemu: simple merge of tdz/bug-1014065
qemu-kernel: fast forwarded to tdz/bug-1014065
Attached file failed-adb-logcat-kernel-orig.log.gz (obsolete) —
FAILED TESTS
-------
test_ndef.js
test_nfc_peer_sendndf.js
test_nfc_tag.js

Gecko: git-1ce168a (HEAD of the day)
Gaia: git-bfc48b6 (HEAD of the day)
qemu: simple merge of tdz/bug-1014065
qemu-kernel: git-431afac (HEAD of the day)
Attached file failed-adb-logcat-orig.log.gz (obsolete) —
FAILED TESTS
-------
test_ndef.js
test_nfc_peer_sendndf.js
test_nfc_tag.js

Gecko: git-1ce168a (HEAD of the day)
Gaia: git-bfc48b6 (HEAD of the day)
qemu: git-cba8ebe (HEAD of the day)
qemu-kernel: git-431afac (HEAD of the day)
Hi, it seems those three NFC test cases fail unconditionally with current B2G code base. Am I supposed to continue merge this or just wait for a stable version?
This worked when I tested it a while ago. But I've heard reports that the latest b2g doesn't work with the emulator for NFC. Let's wait and see if things stabilize.
Attachment #8437597 - Flags: review?(vyang)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #16)
> This worked when I tested it a while ago. But I've heard reports that the
> latest b2g doesn't work with the emulator for NFC. Let's wait and see if
> things stabilize.

Has this been filed as a bug somewhere? It will be easier to track and come back to this bug right away.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #17)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #16)
> > This worked when I tested it a while ago. But I've heard reports that the
> > latest b2g doesn't work with the emulator for NFC. Let's wait and see if
> > things stabilize.
> 
> Has this been filed as a bug somewhere? It will be easier to track and come
> back to this bug right away.

Yes, it's bug 1021180. I'm adding the dependency here.
Depends on: 1021180
Comment on attachment 8437597 [details]
Github pull request for kernel binary (v3)

Changes since v2:

  - added reference to source revision to commit message

Hi Vicamo,

The regression that prevented NFC tests from running should now be fixed by bug 1020034. Can you please try again and merge the branches if possible? Thanks.
Attachment #8437597 - Attachment description: Github pull request for kernel binary (v2) → Github pull request for kernel binary (v3)
Attachment #8437597 - Flags: review?(vyang)
Comment on attachment 8437597 [details]
Github pull request for kernel binary (v3)

Works fine for me now.
Attachment #8437597 - Flags: review?(vyang) → review+
Attachment #8438245 - Attachment is obsolete: true
Attachment #8438246 - Attachment is obsolete: true
Attachment #8438257 - Attachment is obsolete: true
Thanks a lot, Vicamo!
I think this is already landed 
http://dxr.mozilla.org/mozilla-central/source/b2g/config/emulator-jb/sources.xml#115

Tomcat, can we mark this as Resolved Fixed ?

Thanks
Flags: needinfo?(cbook)
(In reply to Yoshi Huang[:allstars.chh] from comment #24)
> I think this is already landed 

Should be 
http://hg.mozilla.org/mozilla-central/rev/ab9749e7bcf6

> Tomcat, can we mark this as Resolved Fixed ?
> 
> Thanks
yeah i guess we can call this fixed also according to https://github.com/mozilla-b2g/platform_prebuilts_qemu-kernel/commit/bce69473f3d63ab927442f4301f5fb4833d20008 and co
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(cbook)
Resolution: --- → FIXED
Target Milestone: --- → 2.0 S5 (4july)
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: