Closed
Bug 1175387
Opened 9 years ago
Closed 9 years ago
libmdns block on recvfrom()
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:2.5+, firefox42 fixed, b2g-master fixed)
People
(Reporter: schien, Assigned: xeonchen)
References
Details
Attachments
(1 file, 2 obsolete files)
3.08 KB,
patch
|
xeonchen
:
review+
|
Details | Diff | Splinter Review |
This is a follow-up for bug 1171827.
Comment 1•9 years ago
|
||
I encountered this bug again. After debugging mdnsd, I found that it error handling logic and mdnsd enters an infinite loop:
* In external/mdnsresponder/mDNSPosix/mDNSPosix.c::mDNSPosixRunEventLoopOnce(), it calls select(). select() returns -1 and errno = EBADF. One of the file descriptor being watched has an error. It notifies its caller that there are no data dispatched.
* In external/mdnsresponder/mDNSPosix/PosixDaemon.c::MainLoop(), continues to the next iteration, with gotData == false and enter mDNSPosixRunEventLoopOnce(). Because EBADF was not handled, select() will still return -1 and EBADF. This becomes a CPU spin without any progress.
We need to upgrade mdnsresponder to see if this can be fixed in a newer version.
Currently have this happening on the Sony foxfooding device. I had a 20 minute period where:
- first was slow to wake
- took 30 secs to respond to each tap
- after device went back to sleep, could not wake it with the power button, device was hot, adb reported 80% CPU consumption of root b2g process.
While reading bugs about this for 5-10 mins, CPU consumption suddenly went back to zero, and device back to normal.
Comment 3•9 years ago
|
||
Gary, Cervantes, are there any updates on this? Based on comment 2 and other reports, foxfooders are still facing this issue. This is a very bad bug.
blocking-b2g: --- → spark+
Flags: needinfo?(xeonchen)
Flags: needinfo?(cyu)
Comment 4•9 years ago
|
||
We may need to patch this our own and upstream the fix. We are already using master branch of mdnsresponder and it hasn't been updated for ~3yrs.
Flags: needinfo?(cyu)
Comment 5•9 years ago
|
||
(In reply to Cervantes Yu from comment #4)
> We may need to patch this our own and upstream the fix. We are already using
> master branch of mdnsresponder and it hasn't been updated for ~3yrs.
Yes, let's please do that :)
blocking-b2g: spark+ → 2.5+
Assignee | ||
Comment 6•9 years ago
|
||
We don't know what triggers this issue, which makes it difficult to investigate and verify.
The library is also used on Android and Apple's devices, but they don't have this issue?
Garvan, do you mean your device went back to normal without any action?
Did you change the network status before the situation happened? E.g. enter basement/elevator
which cause cellular/WiFi disconnect?
Flags: needinfo?(xeonchen) → needinfo?(gkeeley)
> Garvan, do you mean your device went back to normal without any action?
Yes, I was running adb top in a terminal, and then noticed the CPU usage had dropped, I picked up the device and it was responding again. The entire time, the device was on my desk (from when I noticed the problem to when it was resolved).
> Did you change the network status before the situation happened? E.g. enter
> basement/elevator
> which cause cellular/WiFi disconnect?
No.
I am sure other foxfooders will have this happen, we should let the list know we are looking for more feedback on the events surrounding this issue.
Flags: needinfo?(gkeeley)
Assignee | ||
Comment 8•9 years ago
|
||
The file descriptor owned by |DNSServiceRef| is closed before being detached from STS. It may cause |EBADF| while polling on a closed descriptor.
Besides, the descriptor is closed twice by |DNSServiceRefDeallocate| and |PR_Close|, so I use |PR_CreateSocketPollFd| instead.
Patrick, would you give some comment on these modifications?
Attachment #8628111 -
Flags: feedback?(mcmanus)
Comment 9•9 years ago
|
||
(In reply to Gary Chen [:xeonchen] from comment #8)
> Besides, the descriptor is closed twice by |DNSServiceRefDeallocate| and
> |PR_Close|, so I use |PR_CreateSocketPollFd| instead.
we definitely need to backport any fix becuase of that. It could accidentally close another (unrelated socket - opened after the first close).
Comment 10•9 years ago
|
||
Comment on attachment 8628111 [details] [diff] [review]
0001-Bug-1175387-close-file-descriptor-after-detached-fro.patch
Review of attachment 8628111 [details] [diff] [review]:
-----------------------------------------------------------------
::: netwerk/dns/mdns/libmdns/MDNSResponderOperator.cpp
@@ +122,5 @@
> if (osfd == -1) {
> return NS_ERROR_FAILURE;
> }
>
> + mFD = PR_CreateSocketPollFd(osfd);
to be honest, I've never used that bit of nspr.. but the documentation says the result is only for use with pr_poll() and you're doing way more than that - so I don't think it is a good idea.
I think you can continue with the old importFile approach, but just before the pr_close() do PR_ChangeFileDescNativeHandle() to change the osfd to -1.
Attachment #8628111 -
Flags: feedback?(mcmanus) → feedback-
Assignee | ||
Comment 11•9 years ago
|
||
Close fd after detached from STS, and make PR_Close does nothing.
Attachment #8628111 -
Attachment is obsolete: true
Attachment #8638249 -
Flags: review?(mcmanus)
Updated•9 years ago
|
Attachment #8638249 -
Flags: review?(mcmanus) → review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 12•9 years ago
|
||
Keywords: checkin-needed
Assignee | ||
Comment 14•9 years ago
|
||
rebase, carry r+
Attachment #8638249 -
Attachment is obsolete: true
Attachment #8643449 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 15•9 years ago
|
||
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S4 (07Aug)
Updated•9 years ago
|
status-b2g-master:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•