Closed
Bug 835920
Opened 12 years ago
Closed 12 years ago
UnixSocket and Ril mis-interpret read() returning 0 as failure and enter infinite reconnect loop
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bjacob, Assigned: bjacob)
References
Details
Attachments
(3 files, 2 obsolete files)
1.78 KB,
patch
|
qdot
:
review+
|
Details | Diff | Splinter Review |
1.72 KB,
patch
|
Details | Diff | Splinter Review | |
1.37 KB,
patch
|
qdot
:
review-
|
Details | Diff | Splinter Review |
read returning 0 indicates end-of-file, accoring to the man page I'm reading. But Ril.cpp seems to be interpreting it as an error condition and then tries to reconnect.
http://hg.mozilla.org/mozilla-central/file/6cca454559c8/ipc/ril/Ril.cpp#l256
This is giving me an infinite loop with this repeated many times in logcat:
I/Gonk ( 367): Cannot read from network, error 0
I/Gonk ( 367): Socket open for RIL
The attached patch makes it only interpret strictly negative return values as errors; in the zero case, we then seem to be correctly acting on EOF at this line:
http://hg.mozilla.org/mozilla-central/file/6cca454559c8/ipc/ril/Ril.cpp#l284
At least, this removes the infinite reconnecting problem for me.
Attachment #707720 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 1•12 years ago
|
||
Here's an attempt at a theory for why this is happening only for me now and not for everyone everytime. The read size here, RilRawData::MAX_DATA_SIZE, is defined in Ril.h to be 1024. So IIUC we will only hit this bug if the size of the file being read here is an exact multiple of 1024 bytes.
Comment 2•12 years ago
|
||
RIL.cpp is going to be rewritten with UnixSocket in bug 826931. However, the implementation of UnixSocket has the same problem: http://hg.mozilla.org/mozilla-central/file/6cca454559c8/ipc/unixsocket/UnixSocket.cpp#l656
Assignee | ||
Comment 4•12 years ago
|
||
Thanks, here's a patch fixing it there.
Attachment #707731 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 5•12 years ago
|
||
This version avoids sending an empty task if read() returned 0. I assume that's better, but I don't really know for sure and that's untested.
Attachment #707731 -
Attachment is obsolete: true
Attachment #707731 -
Flags: review?(jones.chris.g)
Attachment #707733 -
Flags: review?(jones.chris.g)
Assignee | ||
Comment 6•12 years ago
|
||
And same for the RIL patch, avoid sending empty data if read() returned 0.
There's some trickiness around here: the if (!mIncoming) condition means it matters a lot that .forget() sets the pointer to null. So when we if() this out, we have to make sure that we still null the pointer.
Hope I got this right; seems to be working here, but I can't be very confident, as I'm in the middle of bigger changes and my build is non-functional anyways.
Attachment #707720 -
Attachment is obsolete: true
Attachment #707720 -
Flags: review?(jones.chris.g)
Attachment #707741 -
Flags: review?(jones.chris.g)
Comment 7•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #6)
> Created attachment 707741 [details] [diff] [review]
> ril patch
The whole content of this file is going to be removed. Fixing anything here is probably meaningless.
Assignee | ||
Comment 8•12 years ago
|
||
Well, how soon is it going to be removed? I seem to be hitting this with the gfx-layers-refactoring branch, so I'm blocked on either fixing Ril.cpp or having it replaced.
Assignee | ||
Updated•12 years ago
|
Summary: Ril mis-interprets read() returning 0 as failure and enters infinite reconnect loop → UnixSocket and Ril mis-interpret read() returning 0 as failure and enter infinite reconnect loop
Comment 9•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #8)
> Well, how soon is it going to be removed? I seem to be hitting this with the
> gfx-layers-refactoring branch, so I'm blocked on either fixing Ril.cpp or
> having it replaced.
I'll update my patches tomorrow (actually 6 hr later or "this morning"), and hopefully I can land them tonight (20+ hr later I guess).
Assignee | ||
Comment 10•12 years ago
|
||
Great, that works for me (anything that lands before the end of next week is good enough for me).
Assignee | ||
Comment 11•12 years ago
|
||
Comment on attachment 707741 [details] [diff] [review]
ril patch
cancelling this review request per above comments.
Attachment #707741 -
Flags: review?(jones.chris.g)
Updated•12 years ago
|
Attachment #707733 -
Flags: review?(jones.chris.g) → review?(vyang)
Comment 12•12 years ago
|
||
Comment on attachment 707733 [details] [diff] [review]
unixsocket patch
Actually I'm not a IPC peer or the author of UnixSocket, but in my opinion, I would rather have:
ssize_t ret = read(aFd, data, MAX_READ_SIZE);
if (!ret) {
// EOF found.
return;
}
if (ret < 0) {
...
Attachment #707733 -
Flags: review?(vyang) → review?(kyle)
Updated•12 years ago
|
Attachment #707733 -
Flags: review?(kyle) → review+
Assignee | ||
Comment 13•12 years ago
|
||
Assignee: nobody → bjacob
Comment 14•12 years ago
|
||
Crap. I didn't see vicamo's comment before I r+'d. I agree on the cleanliness, can be filed as a followup.
Assignee | ||
Comment 15•12 years ago
|
||
OK, will do asap.
Comment 16•12 years ago
|
||
The patch doesn't look correct to me. If read() returns EOF this means the peer close()'d the FD (e.g. the peer crashed or has been killed). For RIL.cpp we should treat this as an error and reconnect. read() returns 0 as not EOF only when we pass 0 as the buffer size, but this is not the case in Ril.cpp or UnixSocket.cpp.
This is non-blocking IO. If the peer sends multiple of 1024 bytes then we should get -1 and errno = EAGAIN on the last read() call, not 0 as EOF. As a test I applied the ril patch and stopped rilproxy to trigger the error, then I didn't see reconnection, even after rilproxy was then restarted. The result is: I couldn't dial out anymore. I didn't test UnixSocket part, but I think the result is similar.
Assignee | ||
Comment 17•12 years ago
|
||
I thought it still made sense to put the EOF handling _after_ the error handling: while read() returning both as separate return values makes this not matter in practice, _if_ the API was slightly different so that we could have both an error and EOF, I think that we would want to handle the error.
Also added an assertion to guard our assumption below that ret > 0.
Attachment #708574 -
Flags: review?(kyle)
Comment 18•12 years ago
|
||
Comment on attachment 708574 [details] [diff] [review]
follow-up patch based on Vicamo's suggestion
Review of attachment 708574 [details] [diff] [review]:
-----------------------------------------------------------------
::: ipc/unixsocket/UnixSocket.cpp
@@ +676,5 @@
> return;
> }
> + if (ret == 0) {
> + // EOF found, zero bytes read
> + return;
As Cervantes' comment #16, we should fix rilproxy crash/socket missing instead.
@@ +681,2 @@
> }
> + MOZ_ASSERT(ret > 0); // other cases handled above
This assertion seems to be meaningless.
@@ +688,2 @@
> if (ret < ssize_t(MAX_READ_SIZE)) {
> + // EOF found
It's not EOF here actually.
Comment 19•12 years ago
|
||
Comment on attachment 708574 [details] [diff] [review]
follow-up patch based on Vicamo's suggestion
Review of attachment 708574 [details] [diff] [review]:
-----------------------------------------------------------------
Agreeing with vicamo and cervantes' reviews here. Go ahead and throw vicamo on as addl. review for patches from here on out, since he's seeing UnixSocket more than anyone else on the team these days.
Attachment #708574 -
Flags: review?(kyle) → review-
Comment 20•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 21•12 years ago
|
||
Hi Benoit, may I have your build config? I tend to revert the changes made in this bug because reconnection on EOF is actually nothing wrong as Cervantes had described. When you --enable-b2g-ril, then there is supposed to be ril socket open for RilConsumer. We can't tell the cases you don't have rild at all or the rild dies. As a result, reconnect on such event is necessary and a correct behaviour. Please comment more on your case, config and let's whether can we find a solution for both.
Flags: needinfo?(bjacob)
Assignee | ||
Comment 22•12 years ago
|
||
Sorry for the delay replying. My .userconfig is:
GECKO_PATH=/hack/mozilla-graphics
export B2G_DEBUG=1
export B2G_NOOPT=1
Where mozilla-graphics does not have any relevant change. So I suppose that the only thing in my .userconfig that could make a difference is B2G_DEBUG.
Do as you wish -- I hold off from making an update patch according to review comments, until you've decided whether you want to revert this.
Flags: needinfo?(bjacob)
You need to log in
before you can comment on or make changes to this bug.
Description
•