Closed Bug 1031122 Opened 6 years ago Closed 5 years ago

Add I/O poll library for daemons

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tzimmermann, Assigned: tzimmermann)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

We (will) have several daemons on FFOS that are build around file-descriptor I/O. It would be nice to have a small library in the style of libevent for implementing the poll loop.
Attached file libfdio repository
Hi Julian,

Here's the I/O library. It should reside under system/. I fixed the points you mentioned and some other similar issues.

The Bluetooth daemon is still early WIP, so I hope you don't run into too many bugs. ;)
Comment on attachment 8446980 [details]
libfdio repository

Thanks Michael for taking a look.

Here's the epoll lib. It's supposed to be used by the Supervisor daemon and a Bluetooth daemon.

Some ideas for future:

  - helper functions to support event-specific callbacks,
  - helper functions to forward between file descriptors via splice(),
  - use lib in rilproxy
Attachment #8446980 - Flags: review?(mwu)
Awesome, thank you very much.
I scrolled over it a little, it seems fine, but it was just a quick look :)
Comment on attachment 8446980 [details]
libfdio repository

Updated repository. Changes since v1:

  - fix Android.mk to install libfdio.so in /system/lib
  - fix task allocation and sending
  - added per-event-type call-back functions
I found a problem when adding a file descriptor to epoll, that has previously been removed. 

When removing a file descriptor, the corresponding |fd_state| isn't cleaned up, this leads to the usage of EPOLL_CTL_MOD on a file descriptor which epoll doesn't know about.

add fd 5 -> success
remove fd 5 -> sucess
add fd 5 -> failure

the pull request fixes that.
Attachment #8449125 - Flags: review?(tzimmermann)
Comment on attachment 8449125 [details] [review]
Link to Github pull-request: https://github.com/tdz/platform_system_libfdio/pull/1

Hi Julian,

Thanks for your patch. You just solved the problem that I saw before I turned of my computer yesterday. :D

Is the memset just for safety? I don't like memsets, memcpy, etc. They often do too much and create unnecessary read/write operations. For your patch, I'd prefer if you'd simply set |fd_state[fd].event.events| to 0 in |remove_fd_from_epoll_loop| function. That's the indicator whether an fd is being epoll'ed. I don't think we should clear the rest for no reason.
Attachment #8449125 - Flags: review?(tzimmermann) → review-
I haven't looked too deeply at this but there is just one concern after glancing at this - licensing. For gonk things, we usually use apache 2 since most of the projects there use that.
I did look at the code too btw. It mostly looks fine to me. It's interesting that you're actually using the pipe to transfer data rather than to merely wake up the event loop. Need to look at the code longer though.
Oh, OK! I'll change the license. But when I looked into the source code of rilproxy and nfcd, they both seemed to use MPL. Maybe their license should be changed as well?
rilproxy isn't actually shipped yet, so that's not a problem. nfcd is interesting though - didn't realize. It might make sense to change the license on that.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #6)
> Is the memset just for safety? I don't like memsets, memcpy, etc. They often
> do too much and create unnecessary read/write operations. For your patch,
> I'd prefer if you'd simply set |fd_state[fd].event.events| to 0 in
> |remove_fd_from_epoll_loop| function. That's the indicator whether an fd is
> being epoll'ed. I don't think we should clear the rest for no reason.

I used memset() to make sure that we don't have function ptr and data ptr pointing somewhere, even though we don't use them. Usually before a function ptr is being used, something like if (func_ptr) should be in place.

Let say, epoll_wait() returns a fd that we don't use, but we have an entry in |fd_state| for that fd and the |func| is set because of a previous entry. What will happen is that it executes that old function ptr which could most likely lead to undefined behavior. So I think keeping function ptr and ptr in generell set to null when not used is much cleaner. But you are right in the code size difference, but it will just be a couple of bytes.

tl;dr I did it for safety reasons, but it is your call
Hi

I usually don't do this extra cleanup, because clearing more than the actual SPOT can easily hide bugs in the code, where your SPOT is ignored or misinterpreted. Programs should fail in this case.

But I see your point and it seems reasonable to me in the context of system services. Please clear these pointers explicitly, ideally in a separate function (e.g., void clear_fd_state(struct fd_state* fd_state)). Thanks!
Sounds good thanks :)

I made the change locally, but I don't have the flame here so I will see if the code runs tomorrow and then I will change the pull request. It is already getting too late here...
Alright, I had a little struggle with the build system -.- but I got a lot of help so fixed my problems.

I changed the pull request to implement a clear_fd_state(struct fd_state* fd_state) function :)
If you still want it to be done differently just let me know ^^
Attachment #8449125 - Flags: review- → review?(tzimmermann)
Comment on attachment 8449125 [details] [review]
Link to Github pull-request: https://github.com/tdz/platform_system_libfdio/pull/1

Please see my comments on Github. I'll have to fix the license before I can pull your bug fix anyway.
Attachment #8449125 - Flags: review?(tzimmermann) → review-
Attachment #8449125 - Flags: review- → review?(tzimmermann)
Attachment #8449125 - Flags: review?(tzimmermann) → review-
Thanks so far. There are just some minor style issues left.
Comment on attachment 8446980 [details]
libfdio repository

Updated repository. Changes since v2:

  - changed license to Apache 2.0
Comment on attachment 8449125 [details] [review]
Link to Github pull-request: https://github.com/tdz/platform_system_libfdio/pull/1

Fixed in master. Thanks!

I had to commit the patch directly, because the pull request was broken after me rebasing the repository's history. You're still named as author, of course.
Attachment #8449125 - Flags: review- → review+
Ok very good thanks. Don't worry about the pull request ^^
Hey Thomas, 

what do you think about the possiblity to let the epoll loop exit in a controlled manner.
By exposing a library function that can be called from any callback on an file descriptor event.

I think this would be very useful in the case where the user wants to wait for a single event but doesn't want to block the file descriptor until it happens (blocking fd will drain battery)

It would probably be useful in other cases as well, like clean exit.
Having control over the timeout when calling epoll_loop() would be great too
We could also consider making the timeout changable outside epoll_loop so that I can have a different timeout in each iteration. A callback could be used to notify the user that a timeout occured.
(In reply to Julian Hector [:tedd] from comment #20)
> Hey Thomas, 
> 
> what do you think about the possiblity to let the epoll loop exit in a
> controlled manner.
> By exposing a library function that can be called from any callback on an
> file descriptor event.

Yes, I thought about this as well, but I'd rather implement this as return value from the callback function. Let's say the function declaration looks like

enum io_result func(void* data);

and enum io_result is one of

enum {
  IO_OK, /* continue with next event */
  IO_POLL, /* drop all current events, poll again */
  IO_EXIT, /* exit loop */
  IO_ABORT /* abort loop */
};

> I think this would be very useful in the case where the user wants to wait
> for a single event but doesn't want to block the file descriptor until it
> happens (blocking fd will drain battery)

Well, it won't drain battery if you're not actively polling (i.e., busy waiting). But be warned about such designs. We had this in earlier revisions in some of Gecko's I/O code and it's really bad. It's better to always wait for events by using epoll, etc.

> It would probably be useful in other cases as well, like clean exit.
(In reply to Julian Hector [:tedd] from comment #21)
> Having control over the timeout when calling epoll_loop() would be great too

(In reply to Julian Hector [:tedd] from comment #22)
> We could also consider making the timeout changable outside epoll_loop so
> that I can have a different timeout in each iteration. A callback could be
> used to notify the user that a timeout occured.

On Linux there is the timerfd interface, [1] which exposes a timer by file descriptor. The fd can be monitored by epoll.

Programs that need timeouts can use timer fds. The existing code should be able to handle them perfectly. I wouldn't mind adding a helper function where it makes sense, but I definitely don't want to duplicate existing APIs.

[1] http://man7.org/linux/man-pages/man2/timerfd_create.2.html
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #23)
> ... it's really bad. It's better to always wait
> for events by using epoll, etc.

That's why I want to wait on an event and have a way to do a clean exit of the epoll_wait()

(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #24)
> On Linux there is the timerfd interface, [1] which exposes a timer by file descriptor. The fd can be 
> monitored by epoll.

I was more thinking about control over the |timeout| parameter of the epoll_wait() function[1].
so like:

if (!nevents) {
    if (on_timeout_callback)
        on_timeout_callback();
}

and in on_timeout_callback you could call the api function to let the loop exit, and resend the message and start the waiting for the response again... but if you don't want to do that I can take a look at timerfd, haven't used it before so I don't really know how it works. But that can be changed ^^

[1] http://man7.org/linux/man-pages/man2/epoll_wait.2.html
Hi Julian,

Got it. But what's the use case for that?

The timeout for epoll_wait is only useful as a _global_ timeout to check if anything has happened within n seconds. As soon as two modules what to use that functionality at the same time, they'd conflict.

If you want a per-event timeout, you should

  1) install the event file descriptor
  2) install the timer fd
  3a) if the event happens first, cancel the timer in the event handler
  3b) if the timeout happens first, cancel the event in the timeout handler

That is flexible, generic, and modules don't conflict with each other. Maybe there could be support in the library for setting up the event and timeout at the same time.
Comment on attachment 8446980 [details]
libfdio repository

Updated Github repository:

  - use per-event functions for tasks
  - declare C linkage for public interfaces
Yeah good point, I was thinking right now about one connection only.
Attachment #8449125 - Attachment is obsolete: true
Hi Julian,

I made a patch to support return a value from callback functions. The returned value controls the event loop. Could you have a look and give some feedback? Thanks!
Attachment #8453101 - Flags: feedback?(jhector)
Comment on attachment 8453101 [details]
Support return values in callback functions

Thanks for the patch, I gave you some feedback in the commit:
https://github.com/tdz/platform_system_libfdio/commit/b89b1e12cdb6b70c47a849c3f6ee5a84a290f0a8
Attachment #8453101 - Flags: feedback?(jhector) → feedback+
At some point around the ICS releases, the logging functions/macros were modified. IIRC, they did something like LOGX -> ALOGX. Our emulator is actually new enough to have the ALOGX style macros, but I'm less sure about CAF based ICS devices. If this is going to be used on ICS manifests, you should check if hamachi and tarako will work.

Also, around KK/4.4, I think some log includes were moved around. I don't remember the details of the move, but it should be fine if things work on the kk emulator build.

What's the callback in epoll_loop for?

In send_task, seems like you actually want a static assertion for assert(sizeof(task) <= PIPE_BUF);

The use of the fd to map directly into fd_state seems bizarre to me. That means we can only reliably add fds to listen to soon after the program starts, right? Or only if the program is careful to use less than 64 fds at any moment...
Hi Michael

(In reply to Michael Wu [:mwu] from comment #31)
> At some point around the ICS releases, the logging functions/macros were
> modified. IIRC, they did something like LOGX -> ALOGX. Our emulator is
> actually new enough to have the ALOGX style macros, but I'm less sure about
> CAF based ICS devices. If this is going to be used on ICS manifests, you
> should check if hamachi and tarako will work.
> 
> Also, around KK/4.4, I think some log includes were moved around. I don't
> remember the details of the move, but it should be fine if things work on
> the kk emulator build.

Good point. I'll check this.

> What's the callback in epoll_loop for?

It's the init function. It allows to add file descriptors to the poll loop before the loop is first started. And calling |init_task_queue| is done by init.

> In send_task, seems like you actually want a static assertion for
> assert(sizeof(task) <= PIPE_BUF);

Good point.

> The use of the fd to map directly into fd_state seems bizarre to me. That
> means we can only reliably add fds to listen to soon after the program
> starts, right? Or only if the program is careful to use less than 64 fds at
> any moment...

Allocating the internal data structures statically simplifies the library a lot. Dynamically reallocating data structures while they are used by epoll sounds like trouble.

The limit of 64 has been chosen arbitrarily. I think POSIX defines 1024 as minimum, but that seems to waste lot of memory on data structures. New file descriptors will always have the smallest available integer and I don't expect a single daemon to use more than that 64 at single point. But we can also increase the limit if necessary.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #32)
> > The use of the fd to map directly into fd_state seems bizarre to me. That
> > means we can only reliably add fds to listen to soon after the program
> > starts, right? Or only if the program is careful to use less than 64 fds at
> > any moment...
> 
> Allocating the internal data structures statically simplifies the library a
> lot. Dynamically reallocating data structures while they are used by epoll
> sounds like trouble.
> 

FWIW, in other places where I use epoll, I use a fixed data structure combined with a simple search through the array. There's typically very few fds being listened to, so that doesn't hurt much.

> The limit of 64 has been chosen arbitrarily. I think POSIX defines 1024 as
> minimum, but that seems to waste lot of memory on data structures. New file
> descriptors will always have the smallest available integer and I don't
> expect a single daemon to use more than that 64 at single point. But we can
> also increase the limit if necessary.

This needs to be documented if you intend to have a limit. This essentially means that this library cannot be used within gecko code - gecko code can easily reach hundreds of open FDs.
Hi Michael

(In reply to Michael Wu [:mwu] from comment #33)

> This needs to be documented if you intend to have a limit. This essentially
> means that this library cannot be used within gecko code - gecko code can
> easily reach hundreds of open FDs.

Exactly! This is really just a minimal library for system daemons. It is *not* intended to be used in Gecko and I'll f- any patch that tries to do so. In Gecko we have libevent and related data structures.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #34)
> (In reply to Michael Wu [:mwu] from comment #33)
> 
> > This needs to be documented if you intend to have a limit. This essentially
> > means that this library cannot be used within gecko code - gecko code can
> > easily reach hundreds of open FDs.
> 
> Exactly! This is really just a minimal library for system daemons. It is
> *not* intended to be used in Gecko and I'll f- any patch that tries to do
> so. In Gecko we have libevent and related data structures.

Hmm... still seems really odd to me to not be able to monitor more fds if we happen to have more than 64 fds open. Seems like the kind of thing that'll bite you or someone else down the line. Up to you though, as long as this is documented.
Comment on attachment 8446980 [details]
libfdio repository

I think I've read this code enough times to stamp it.

Some thoughts from reading this -

- Operators taking two operands often don't have a space before and after, but not always. I usually prefer to consistently add a space before and after those operands.

- A lot of this would be simplified in C++. The interface could remain in C.

- Using an init callback in epoll_loop still seems very weird. If we had something like epoll_init and epoll_uninit, like init_task_queue and uninit_task in task.h, we wouldn't have to do that.

- The function pointer type used in fd_events could use a typedef.
Attachment #8446980 - Flags: review?(mwu) → review+
Also, clear_fd_state seems like it should just use memset.
(In reply to Michael Wu [:mwu] from comment #37)
> Also, clear_fd_state seems like it should just use memset.

That's what I originally proposed, but Thomas didn't like the code memset() adds.
I would be in favor of memset() as well, since you won't forget to change the cleanup routine when the struct changes.
(In reply to Michael Wu [:mwu] from comment #35)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #34)
> > (In reply to Michael Wu [:mwu] from comment #33)
> > 
> > > This needs to be documented if you intend to have a limit. This essentially
> > > means that this library cannot be used within gecko code - gecko code can
> > > easily reach hundreds of open FDs.
> > 
> > Exactly! This is really just a minimal library for system daemons. It is
> > *not* intended to be used in Gecko and I'll f- any patch that tries to do
> > so. In Gecko we have libevent and related data structures.
> 
> Hmm... still seems really odd to me to not be able to monitor more fds if we
> happen to have more than 64 fds open. Seems like the kind of thing that'll
> bite you or someone else down the line. Up to you though, as long as this is
> documented.

Let me see how much effort this is. I don't what to complicate code unnecessarily, but if there's an easy way, I'll probably add dynamic allocation for these data structures.
Hi Michael,

Thanks for the review.

(In reply to Michael Wu [:mwu] from comment #36)
> Comment on attachment 8446980 [details]
> libfdio repository
> 
> I think I've read this code enough times to stamp it.
> 
> Some thoughts from reading this -
> 
> - Operators taking two operands often don't have a space before and after,
> but not always. I usually prefer to consistently add a space before and
> after those operands.

No problem, I can clean this up. I usually use spaces to make precedence more obvious. For example, in

  a = b&c

the bitwise 'and' is executed first, and the assignment afterwards.

> - A lot of this would be simplified in C++. The interface could remain in C.
> 
> - Using an init callback in epoll_loop still seems very weird. If we had
> something like epoll_init and epoll_uninit, like init_task_queue and
> uninit_task in task.h, we wouldn't have to do that.

I'd rather get rid of init_task_queue/uninit_task_queue. :)

The reason I avoid init/uninit functions here is that someone could try to run epoll_loop without epoll_init, or forget to run epoll_uninit afterwards. If you only have epoll_loop, it's much harder to misuse.

> - The function pointer type used in fd_events could use a typedef.

In general, I don't like typedefs, because they often obfuscate code. Function pointers might be an exception in some cases. [1], Sec 3.5 has some good arguments.

[1] http://www.linuxsymposium.org/archives/OLS/Reprints-2002/kroahhartman-reprint.pdf
(In reply to Michael Wu [:mwu] from comment #37)
> Also, clear_fd_state seems like it should just use memset.

I recognize that clearing variables (to zero) can be safer in some situations. But on the other hand, the value is chosen (almost) arbitrarily and there's no guarantee that it's actually the safe value in all cases.

But the reason why I don't like memset is similar to the typedef argument above. Memset doesn't 'talk to the programmer' about what happens in the program. Explicitly cleaning the fields that matter does so.
Comment on attachment 8446980 [details]
libfdio repository

Updated github repository:

  - merged support for returning results from I/O callbacks
  - dynamically allocate memory for fd states; remove MAXNFDS limit
  - support uninit callback for cleaning up at the end of the event loop
  - support logging on pre-JB platforms
  - code cleanups
Comment on attachment 8453101 [details]
Support return values in callback functions

Merged.
Attachment #8453101 - Attachment is obsolete: true
No longer blocks: 1005934
Landed in bug 1100270.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Depends on: 1100270
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.